


i 


eee San 


Bea 
a e a , 
5 nl 
a J 
a i - e 
a 
. = ; . A 
\ ita Y 
Pa . , pan 
ha eee 
. os 7 
- A ba - 
a . ry 
cs * Sea . 
: 7 a 
er F 
aa f 
- ~ 
; c 
5 . 2 
7 , or ‘ 
ae "ap | bs 
: fe é 5 : = 
Py . 
; . ” 
. . 
: - 
“ bd ‘ = 
a as 
i, Pt " nan 
i, ik Weg ; aah oe 
a ” a) 
5 s 4 
Fr ’ _ 
= a one a 
Phd . 7 a8 Ly 
i BS 
*. 
we 
* . 2 
: ' 


“ 





ae 


me 


ay 


ve 


A THEORY OF SOME MULTIPLE DECISION PROBLEMS, I' 


By E. L. LeaHmMann 
University of California, Berkeley 


Summary. A class of multiple decision procedures is described and its members 
are shown to possess uniformly minimum risk among all procedures that are un- 
biased with respect to a certain loss function. This provides a justification for a 
number of procedures considered by Tukey, Duncan, and others, for certain 
classes of point estimates, and for some nonparametric decision procedures based 
on sample cumulative distribution functions and related to tests of the Kolmo- 
goroff-Smirnoff type. 


1. Introduction. As has frequently been pointed out, many statistical situa- 
tions, which it is customary to treat by means of hypothesis testing, really involve 
a choice between more than two decisions. In such problems, when the hypothesis 
is rejected, one wants to know in which of a number of possible ways the actual 
situation differs from the one postulated by the hypothesis. By formulating the 
problem as one involving only two decisions one not only neglects to differentiate 
between certain alternative decisions, which may differ considerably in their 
consequences, but one may also be led to an inappropriate acceptance region for 
the hypothesis. 

As an example suppose that X and Y are independently normally distributed 
with unit variance and means £ and 7. While there are situations in which one 
only wishes to determine whether the hypothesis H: § = 7» = 0 is true or not, 
it is perhaps more common that in case of rejection one will want to know 
whether it is — or 7 that is different from 0, or both, and of the nonzero means 
whether they are positive or negative. Here the first formulation implies com- 
plete spherical symmetry between the alternatives, and the appropriate accept- 
ance region is 


at+ysSe. 


On the other hand, when the choice lies beiween the nine indicated decisions, 
it seems most natural to accept H when 


max (|z|,|/y\) Sk, 


and in case of rejection to divide the rejection region into the eight subregions 
shown in Fig. 1 corresponding to the eight possible alternative decisions. This 
procedure actually will be justified later in terms of a specific loss function. 


Received February 8, 1956. 


1 Work done while the author was a Fellow of the John Simon Guggenheim Memorial 
Foundation. 





). L. LEHMANN 


One of the attractions of formulating statistical problems in terms of hy- 
pothesis testing is the resulting structural simplicity. However, at the same time 
this reduction to a choice between only two decisions frequently causes complica- 
tions by creating a class of alternatives which combines too many different ele- 
ments. In many such cases, if one is willing to forego structural simplicity and 
to divide the class of alternatives into its natural components, one obtains a 
multiple decision problem, which admits a simpler and more natural solution 
than the apparently less complex testing problem. 

As an example consider the comparison of k variances oj, --- , 0; on the basis 
of samples X;;(j = 1, --- ,n;i = 1, --- , k) from normal populations N(E, , 7;). 
There does not seem to exist any really convincing solution to the small sample 
problem of testing the hypothesis H: o, = --- = o,. On the other hand, there 
exists a natural multiple decision problem based on the comparisons of the 
different pairs (¢;, o;). The associated acceptance region for H is the one dis- 
cussed by Hartley [6]. 

In the present paper, a general class of multiple decision problems is described 
together with procedures that seem appropriate for these problems. The method 
of constructing the procedures is not new, and is in fact the one used in most 
cases of multiple comparisons treated in the literature. It was mentioned ex- 
plicitly in 1950 by Howard Levene in a seminar lecture at Columbia University, 
and was recently stated, with only a minor difference, by Duncan in [3]. It is 
also closely related to a principle of test construction proposed by Roy [12], 
and utilized further by Bose and Roy [2]. As will be shown, the method is appli- 
cable not only to the typical multiple comparison problems, but also to problems 
of point estimation and various nonparametric probiems. 

The main purpose of the present paper is to prove an optimum property of 
the above procedures, namely that they are unbiased (in a sense introduced by 
the author in [9]), and that among all unbiased procedures they uniformly mini- 





MULTIPLE DECISION PROBLEMS I 


mize the risk. It should be mentioned that in the application to specific cases a 
number of distributional problems arise, concerning error control and other 
properties of the operating characteristic of the procedures. These problems, 
which require separate treatment for each example, are not considered here but 
instead the paper is concerned only with general aspects of the method. In 
particular, the main result involves assumptions concerning the distribution of 
the observable random variables only indirectly in that the existence of tests of 
related hypotheses H is assumed, which possess certain optimum properties. 
Apart from this, the conditions concern only the structure of the multiple decision 
problems in terms of the hypotheses H. 


2. A class of multiple decision procedures. Let X be a random observable, 
the distribution of which depends on the parameter 6, and suppose that a number 
of different hypotheses concerning @ are of interest, say H,: 0ew,, yeT. The 
class of alternatives K, to H, is that @ lies in the complement of w, , which will 
be denoted by w;’. When considering these hypotheses simultaneously one will 
wish to determine whether all of them are true, all of them false, or in the inter- 
mediate cases, exactly which of the hypotheses hold and which do not. 

One is therefore faced with a multiple decision problem in which the different 
possible decisions correspond to the statements that a certain set of the hypothe- 
ses H, is correct while the remaining ones are false, or equivalently, that @ lies 
in a certain set, say Q; , i « J, determined by these conditions. The sets 2; , which 
are the atoms of the field of sets generated by the sets w, , are formally given by 
(2.1) 2; = n wy”, 

yeT 

where the z’s indicate which of the hypotheses are true (x;, = 1) and which are 
false (z;, = —1) for the given Q; . If, as is frequently the case, some of the in- 
tersections formally defined by (2.1) are empty, we shall restrict the 0’s to 
denote the nonempty ones, and shall require that none of the possible decisions 
should correspond to the empty intersections. Authors dealing with specific 
multiple comparison problems have frequently not insisted on this restriction, 
and this is also the point in which Duncan’s definition, referred to above, dif- 
fers from the one given here. 

To specify a loss function, suppose that the losses for the individual testing 
problems are a, and b, for falsely rejecting and accepting the hypothesis H, , 
and that in the simultaneous consideration of these problems the losses are 
additive. If then @¢2; and the decision d, is taken that 62, , the resulting loss is 


we = ) > (€ixy a, + &ib,), 


7. 


where 


(2.3) 


1 if xz, = es ky = —l, 
Ciky 


0 otherwise. 








4 E. L. LEHMANN 


Formula (2.2) expresses the fact that wy, is the sum of all those a, for which 
H, is true (x;, = 1) but rejected (2,, = —1), plus the sum of the b, for which 
H, is falsely (x;, = —1) accepted (x,, = 1). The risk is thus simply a weighted 
sum of the probabilities of error. Slightly more generally (taking the a’s and 
b’s to be given) one may wish to put, in the case that I is finite, 


Wk = pm Vy (€iky Qy + €kiy b,), 
ye 
where the v’s are any positive weights, and in the general case 
(2.4) wk = | (€ixy ay + €xiy Dy) du(y). 
. 


Suppose now that attention is restricted to nonrandomized procedures. This 
involves no essential loss of generality since it ¢..n be achieved for all decision 
problems with which we shall be concerned, by adjoining to the original random 
variables a continuous variable which is independent of them. Then a decision 
procedure for the given multiple decision problem is a partition of the sample 
space into sets D; such that, when the observation falls into D;, the decision 
d; :0eQ; is taken. It is natural to try to relate these decision procedures to the 
tests of the hypotheses H,. Let @ be a family of such tests with acceptance 
regions A, for H, and rejection regions A;', and consider the decision procedure 
that @ induces through the relation 
(2.5) D; = AS". 


yeT 


Here it nay happen that 


P(UD, <1. 
tel ‘ 

This is the case when the union of those intersections M ,A%'’, for which the 
corresponding inters:°tion (2.1) is empty, has positive probability. The parti- 
tion (2.5) is then not a procedure for the given decision problem. When the in- 
duced procedure satisfies 

(2.6) PUD, = 1 for all 6, 

ier 

the family @ is said to be compatible. Since the acceptances and rejections making 
up the intersection (2.5) are consistent with each other if the corresponding 
intersection (2.1) is nonempty, and are otherwise mutually contradictory, com- 
patibility is equivalent to the condition that the simultaneous application of 
the tests A, not lead to any inconsistencies. 

In the case that [ is noncountable, a further complication is the possibility 
that a set of measurable A,’s through (2.5) may give rise to a nonmeasurable 
D; . However, barring such measurability difficulties, which it is usually easy 
to eliminate in specific problems, one has the following result. 

THEOREM 1. Relation (2.5) defines a 1:1 correspondence between the decision 











MULTIPLE DECISION PROBLEMS I 


procedures for the problem induced by the hypotheses H, , yeT and the compatible 
families @ of tests of these hypotheses. 

Proor. Clearly, if @ is compatible, (2.5) does define a valid decision proce- 
dure. Conversely there always exists one, and essentially only one, compatible 
@ leading to a given decision procedure. To see this, suppose first that U D, is 
the whole sample space X. The desired result is then a consequence of the fact 
that (2.5) implies and is implied by 


(2.7) 


which we shall now prove. 
Assume first that (2.5) holds, and let reA,. Since x is in some D; , it must be 
in some D; with z,;, = 1, and we have A, CUjy.2,,.1,D,;. On the other hand, 


.= UU Tao VU aes 


{ixejy=l|] beP {t:2yy=1} 


y* 


Thus (2.5) implies (2.6). 
: as " ail 
Suppose conversely that (2.6) holds. Then AY = U;,.,...1)D,, and hence 


nN Aw = N ine De 
yel yer (i:zjy—ze,) 
Now a point belongs to the set on the right-hand side of this equation if and only 
if it lies in a set D; for which x;, = z;, for all y. But this holds if and only if 
i = k, so that 
N U D; = D, , 
vel {i:ziy=tey) 
and (2.6) implies (2.5), as was to be proved. 
In case 


UD;=X-—-—WN 


’ 


where N is a null set for all distributions Ps , relation (2.5) clearly holds also if 
each A, is replaced by A,M (X — N). Applying the correspondence just proved 
to the space XY — N, we see that (2.7) is replaced by 


A,n(X —N) =UD,. 


Thus the D’s determine the A’s on the set XY — N and it is seen that, except on a 
null set, (2.5) and (2.7) establish the desired 1:1 correspondence quite generally. 

For later reference we also give the relationship of the multiple decision pro- 
cedure to the tests of the individual hypotheses in the case that they may be 
randomized. Suppose, for this purpose only, that T and J are countable. The 
tests are then described by means of critical functions g, , where ¢,(z) denotes 
the probability with which H, is rejected when z is observed. Similarly, a pro- 
cedure for the given multiple decision problem is a function y of the two argu- 
ments 7 and x, the value ¥,(z) of which is the probability with which decision D; 





> 


6 E. L. LEHMANN 


is taken when z is observed. If@ = {y, , yeI'} is a family of tests of the hypothe- 
ses H, , equations (2.5) and (2.7) become 


vi — IT = ‘ 


al 
where gy = 1 — 


3. Some classification problems. We shall now illustrate the concepts of the 
previous section with some classes of examples. It will be shown later that for 
many distributions of interest (normal, binomial, Poisson, etc.) the indicated 
procedures possess the optimum property of uniformly minimizing the risk 
among all unbiased procedures, provided the levels a, of the tests ¢, of H, are 
related to the losses a, and b, through the equation 
(3.1) a, = b,/ (a, + 6,). 

(i) Three-decision problems. Perhaps the simplest problem involving more 
than two decisions, and one which has been previously treated in somewhat 
similar terms by the author [8] and by Duncan [3], is that of deciding whether a 
real-valued parameter @ is less than, equal to, or greater than a specified value 
6,. This may be generated by the hypotheses H, :6 = 6, H.:0 S @, which 
according to (2.1) lead to the choice between the three parameter sets Q) = 
ww, 10 = Oy; Q; = wwe 20 > ; % = 1 @: 10 < 6. If the losses a and b 
of false rejection and acceptance are taken to be the same in the two component 
problems, the loss function is given by 


dy d, 


b at+h 
0 a 
06> & a+b b 0 


Here the greatest weight is attached to the losses resulting from taking decision 
d, when d, is correct and vice versa, which are neglected in the usual iormulation 
of testing H:@ = &. 

In several cases of interest, for example those of a binomial or Poisson popu- 
lation, or of a normal population with @ either the mean or the variance where 
the other parameter may be unknown, the tests g; and g, of H,; and H, depend 
on a common statistic 7’, and have rejection regions 7 < C,; for H; and T 2 C, 
for H,, where the constants are determined by 


(3.2) Po tT 3 Ci} = Po {T = Cj = 


25 aQ. 


a ie , : lh i a _ . - 
Of the intersections w{'ws’, only w; w, is empty. The corresponding action, 
which would consist in the simultaneous rejection of both hypotheses, is im- 
possible when a < 3 since then C, < C,. Subject to this restriction, the pair of 





MULTIPLE DECISION PROBLEMS I 


tests (¢: , g2) is therefore compatible. The induced decision procedure consists 
simply in applying the ‘“‘equal-tails” test of the hypothesis H:@ = at level 
2a and drawing the indicated conclusions. It is of interest to note that this two- 
sided test usually does not coincide with the standard unbiased test to which 
one is led by combining the two decisions d, and d,. If the losses are not the 
same for the two hypotheses, but are a; , b; for H; , equation (3.2) is replaced by 


P,,|T s Ci} = a, Po,{T 


and the condition for compatibility becomes a, + a < 1. 

'requently one wishes to determine not whether @ is exactly equal to 6» but 
only whether this equality holds approximately. The corresponding three-de- 
cision problem of deciding whether 9 < %, 6; S 6 S # or @> 6, (where 
6; < % < 6), can be generated by the hypotheses H, :@ < @ and H,:6 = @. 
The tests ¢; and g: then have the same form as before with the constants C, , 
C, determined by 


PIT SG} =a, 


They are compatible with the given decision problem if a; + a2: < 1, and if the 
cumulative distribution function Fs(c) = P»{T S c} is for each c a decreasing 
function of @. For we then have 


Cy = Fel(a) < Fe, (a) S Fo, (1 — a) = C, 
and hence C; < C,, which was seen above to be the condition for compatibility. 
In the above two examples the assumptions concerning the form of the tests 
¢; and ¢g. was unnecessarily restrictive. As an illustration of a somewhat more 
general situation consider the case of two independent binomial variables X, 
and X,, and the problem of deciding whether p. < pi, po = pi OF po > pr. 
The best unbiased tests of the one-sided hypotheses pe S p; and p, = p; at 
level a reject the hypotheses when X, is too large, respectively too small, on each 
line segment X, + X, = const., where the cutoff points are determined so that 
the conditional probability in each tail is equal to a. It is seen as before that this 
pair of tests is compatible provided a < 3. The induced three-decision proce- 
dure consists in performing the two-sided test conditionally on each line seg- 
ment as an “equal tails’ test at level 2a, and then making the indicated state- 
ment. 

(ii) Classification of two independent parameters. Let §, » be two real-valued 
parameters, and consider the problem of classifying and y as being S or > 
than & and S or > than m respectively, so that the choice lies between the four 
parameter sets 2: Sh, nS m;%:& Sh, n> wm; 2°: >, 9 Sm; 
Q, :£ > &, > m. This clearly can be generated by the hypotheses H; :§ S &, 
H.:n & . The problem of compatibility does not arise here since none of the 
intersections (2.1) is empty. The procedure consists in carrying out the two tests 





8 E. L. LEHMANN 


separately, and combining the results in the obvious fashion.” Examples in which 
this would be appropriate are that of a normal population NV(é, o) where it is 
desired to compare ¢ and o° with & and o? respectively, or that of a bivariate nor- 
mal population where both means are to be compared with certain standards. 

The method applies of course equally well when more than two parameters 
ale to be classified. An example is the preference ordering of m objects by n 
judges. If we assume that the judges constitute a sample from a population with 
probability p,; of preferring object 7 to object 7, we can test each of the hypothe- 
ses H;; :pi; S } by means of a sign test. The result of applying the set of all 
of these tests is a judgment concerning each pair (7, j) that either 7 is preferred 
to j, or j to i, or that neither is preferred to the other. In general one will of 
course not obtain a simple ordering but a complex comparison, which may be 
represented by a preference polygon as shown for example in Fig. 1 of {7}. 

(iii) Comparing several populations. Let samples be given from s populations 
with distributions depending on the parameters 6, , --- , 4, and possibly certain 
nuisance parameters, which may or may not be common to the different popu- 
lations, and consider the hypotheses H;; :6@; S @;. To be specific, let the dis- 
tributions be normal with means 6; and common variance o , and let the rejection 
region for H,; be, in the usual notation, X; — X; S C,,S with 


C;; = C[A/n;) + (1/n,)}”. 


We shall again assume that the level of the tests is less than }, so that the con- 
stants C;; are positive. The procedure, which is essentially the one proposed by 
Tukey in [14], leads to the decision 6; = 6; when | X; — X,;| S C,,S, and it is 
seen that the system is not compatible since with positive probability 


a Ral oe Cal 


«hy = yrs 


but | X¥; — X,| > CS while the associated parameter sets 6; = 60;, 0; = 
and 0; + @, have an empty intersection. A justification of the resulting incon- 
sistencies may be obtained if one interprets the acceptance of a hypothesis 
sufficiently loosely. For such inconsistencies occur only if at least one of the hy- 
potheses is accepted. If, for example, H,;; and H , are both rejected, we have 


Xx, 


Therefore, Hy, is then also rejected corresponding to tae fact that @; > @,;, 
6; > 6 implies 6; > & . 


A perhaps more satisfactory solution is obtained if one replaces the hypotheses 
Hi; by Hi; 26; < 6; + A, (A> 0), with rejection regions X¥; — X; > €%,S, 


where the constants are determined so that the probability of rejection is a 
when 6; = @; + A. It is easily checked that this system is compatible since 
H,, may be false when both HH’, and H', are true. Each difference @; — 6; is 


2 A different justification of the resulting acceptance region for the combined hypothesis 
was given by the author in [10]. 





MULTIPLE DECISION PROBLEMS I 9 ; 


now classified as being either << —A, between —A and A, or >A. Since the sig- 
nificance statement obtained for the differences 6; — 6; in this manner are self- 
consistent, they lead to a classification of the s populations of the kind described 
by Duncan in Section 3 of [3]. 

As another example consider samples of equal size from N(é;, o:), and the 
problem of classifying the populations according to their variances. This can be 
generated by the hypotheses [7,; :0; S o;, for which the rejection regions are 
S,/S; > C. In particular the decision that all of the o’s are equal is taken when 
max(S,;/S,;) S C, so that the present procedure constitutes a refinement of the 
test for equality of variances discussed by Hartley [6]. For the same reason as 
in the preceding example the system is incompatible. But for a < 4} inconsisten- 
cies can again occur only if at least one acceptance is involved since the rejection 
of both H;; and H» implies S;/S, > C® > C, and hence the rejection of He . 
As before the inconsistencies may be avoided altogether by replacing the H;; 
by the hypotheses H; :0; S 60; (6 > 1), and in this way one obtains, as in the 
case of the means, a satisfactory classification procedure for the variances. 


4. Estimation. (i) Point estimation. Let 6 be a continuous real-valued param- 
eter, and consider the decision problem generated by the set of hypotheses 
H(6):@ <  . If @* denotes the true value of the parameter, the hypotheses 
H(@) with @* < 4 are true while those with @* > 6 are false. The associated 
intersection (2.1) is therefore 


(4.1) N wl&)n NM w*(&). 


6926" 0o<6* 


Since w(6) is the interval 6 S % , the set (4.1) consists of the single point 6*, 
or in the case that nuisance parameters are present, of the totality of points for 
which @ = 6*. In the induced multiple decision problem the possible decisions 
therefore correspond exactly to the possible true values of 6, that is, the problem 
is one of point estimation. 

Suppose now, as in Section 2, that the tests of H(@)) are nonrandomized. If 
then A(@) is the acceptance region for /7(@), a necessary and sufficient condition 
for compatibility is that, except on a null set, 

(4.2) A(0;) < A(®), whenever 6 S 4 


, 


and 


(4.3) M A(6) = A(). 
8> 0 

That this condition is necessary is obvious since the corresponding relationships 
do hold for the w(@)’s. To prove sufficiency one must, by the criterion given at 
the end of Section 2, show that each sample point lies in one of the intersections 
(4.1) with w(@) replaced by A(@). Consider now the set of @’s for which the sam- 
ple point is in A(@), and let 6 be its greatest lower bound. Then by (4.2) and (4.3), 
H(®) is accepted for % = 6 and rejected for 6 < 6, and hence the sample point 





10 E. L. LEHMANN 


lies in the intersection 


M AO) a NM A*(&), 
: b<6 


Go2 
as was to be proved. The decision taken in this case is that @ lies in the set 


N _ w(Ao)n Nw" (6) = fA}. 
628 6<8 


that is, that @ equals 6, which therefore is a point estimate of 6. The relationship 
6< & =X & A(H%) 


shows furthermore that measurability of the sets A(@.) implies that of the func- 
tion 6, and conversely. One also sees from it that 6 is a lower confidence limit 
for 6 with confidence coefficient 1 — a, if all the tests are carried out at level a. 

If 6* < 6, the hypothesis H(@) is incorrectly rejected for 6* S 6 < 6 and 
never falsely accepted. Thus in accordance with (2.4) the loss may be taken as 
a(6 — @*), where the loss for false rejection is assumed to be a for all of the hy- 
potheses H(@). Similarly, when 6 < 6*, the loss is b(@* — 6). The loss is there- 
fore the absolute error, riultiplied by a or b as @ is an over- or underestimate. 
If these two kinds of error are considered of equal importance, the loss is simply 
proportional to the absolute error. 

If the losses a and b are the same for the different hypotheses H(@), and if 
(3.1) is assumed to hold, then the different tests must be carried out at a constant 
level of significance a. Under this assumption, the optimum one-sided tests 
satisfy (4.2) and (4.3) in many of the standard problems, in particular wnen one 
is dealing with an exponential family of distributions. On the other hand, these 
conditions may also hold in cases in which the losses a and b and hence also the 
level a at which H(@) is tested, vary with @. As an example suppose that the 
tests have acceptance regions of the form 7 < C(@), where the c.d.f. Fa(c) = 
P.{T S c} is for each c a continuous and decreasing function of 6. Then C(@) = 

"> [a(0)}, and conditions (4.2) and (4.3) are satisfied provided C(@) is an increas- 
ing function of @, which is continuous on the right, or equivalently if a(@) is 
decreasing and continuous on the right. 

Slightly more generally, one can take the losses for over- and underestimation 
to be 


6 6 
(4.4) i | du(6), b | du(é), 
e* . 


with « not necessarily Lebesgue measure. In case of a scale parameter, for exam- 
ple, an appropriate loss function may be given by 


6 6* 
I 4 ‘ 
a | 9% = alog (6/0*), b | 5 = b log (0*/6). 
. 


(ii) Point estimation after a preliminary test of significance. It is frequently of 
interest to obtain a point estimate of a parameter 6 after one has tested, and re- 
jected, some hypothesis concerning it. If for example a new treatment is being 





MULTIPLE DECISION PROBLEMS I 11 


compared with a standard one, the hypothesis may be tested that the new treat- 
ment does not represent an improvement, that is, that 6 = 7 — & S 0, where 
n and £ denote the means of the new and old treatments. In case the hypothesis 
is rejected one requires an estimate of » — &. 

A procedure for testing the hypothesis H:@ S 6), and estimating @ in case of 
rejection, can be generated by the set of hypotheses H(6,):@ S 6, , with @, => %. 
If the tests are carried out at a constant level a, the procedure consists in per- 
forming the usual test of H(6@:) and in case of rejection estimating @ by the esti- 
mate 6 of (i), that is, by the lower confidence limit corresponding to confidence 
coefficient 1 — a. A drawback of this method is the limitation it imposes on the 
levels a(@). The conservative attitude reflected by the customary choice of a 
small level a for testing H suggests that also in the estimation part of the prob- 
lem an overestimate should be considered more serious than the corresponding 
underestimate. However, one would usually still wish to test H at a lower level 
than is desirable for the construction of the estimate. Unfortunately such a com- 
bination of levels leads to an incompatible procedure. 

In some cases a procedure with the desired properties can be obtained by a 
slight modification of the construction given above. To illustrate the method con- 
sider a single normal variable X with mean 6 and unit variance. The hypothesis 
H:@ <= 0 is to be tested at level a < } with the acceptance region X S C and 
in case of rejection @ is to be estimated by X which corresponds to the level 3. 
This may be generated by the family of hypotheses H(6,):@ < 6, , with 6, = 0 
and 6, = C, at the levels a(0) = a, a(6@;) = 3 for 6, 2 C. In a similar manner 
one can generate a joint testing and estimation procedure, in which the level for 
the estimation part of the problem is higher than that of the test, in the case of a 
binomial or Poisson variable. Another example in which this is possible is that of 
the ratio r’/o’ of two variances (for example in components of variance problems), 
where one wishes to test H:7°/o” < k, and in case of rejection requires a point 
estimate of the variance ratio. The method, however, does not appear to be ap- 
plicable without further modification to the case of a sample X, , --- , X, from 
N(&, 0”) on the basis of which one wishes to test H:£ < 0 and in case of rejection 
to use, say X, as an estimate of ¢. While this problem may be generated by the 
class of hypotheses N(&):& S & with & = 0 and & 2 CS, this class depends 
on the random variable S, and can therefore be determined only after the ob- 
servations have been taken. 

The indicated difficulty usually does not exist if the estimation problem arises 
when H is accepted rather than when it is rejected. An example of this occurs 
when one wishes to test the hypothesis that a drug has a significant toxic effect 
(H:@ = 4), and in case H is accepted wants to estimate the size of this effect. 


5. Some nonparametric problems. (i) Testing for goodness of fit. Let 


X, Sea ae 


be independently distributed with cumulative distribution function F, and con- 
sider the problem of deciding whether F = Fy, or, if this is judged not to be the 





12 E. L. LEHMANN 


case, of determining the sets of points u for which F(u) is <, =, and >Fo(u). 
This problem may be generated by the set of hypotheses H,(u):F(u) 2 Fo(u) 
and H_(u):F(u) S Fo(u). If in order to be specific we assume Fy to be continu- 
ous and strictly increasing, the set of u for which the true F(u) exceeds Fo(u) is a 
union of intervals each of which is open on the right. It is necessary for compati- 
bility that the corresponding condition hold for the set of u at which F(u) is 
judged to exceed F(u), that is, for the set of u for which the sample point is in 
A,(u)M Az'(u). A similar condition must be satisfied by the sets Az'(u) N 
A_(u). These conditions are clearly also sufficient for compatibility since given 
any two such unions of intervals, there exists a cumuiative distribution function 
F which is in the desired relationship to Fy . 

The best unbiased tests of the hypotheses H,(u) and H_(u) are the appro- 
priate one-sided sign tests, which reject the hypotheses if the number X(u) of 
observations <1 satisfies 


(5.1) X(u) < a(u) and X(u) > bu), 


respectively. In order to achieve desired levels of significance a,(u) and a_(u) it 
may be necessary to introduce an auxiliary random variable Z, distributed uni- 
formly on (0, 1), and to reject H,(u) and H_(u) as 


(5.2) X(u) + Z<a(u) and X(u)+ Z> blu). 


Unfortunately the usual choice of levels, a,(u) = a_(u) = a, is not satisfactory 
for the present problem. In fact, with this choice the tests (5.2) will always lead 
to rejection for u sufficiently large and sufficiently small respectively. The dif- 
ficulty stems from the circumstance that for sufficiently extreme u, X(u) tends 
to the sure variable n or 0 and hence contains no information, so that the deci- 
sions in the extreme tails depend solely on the value of Z. Since one is usually 
not even particularly interested in the behavior of F in the extreme tails, it is 
natural to avoid this difficulty by choosing a,(u) and a_(u) in such a way that 
they tend to 0 as u tends to + «. This will be the case for example if in (5.1) one 
sets 


a(u) = Fou) — A and b(u) = Fyo(u) + A, 


so that the acceptance of all of the hypotheses simultaneously reduces to that of 
Kolmogoroff’s test of the hypothesis F = Fy. 

One obtains a completely analogous problem, only without the complications 
caused by the behavior in the tails, if the observations are grouped. The procedure 
will then decide for each interval whether the hypothesis p; = pj is to be ac- 
cepted or whether the observed frequency in the ith interval indicates that p, 
exceeds or falls short of its hypothetical value. This is a special case of the classi- 
fication problems considered in (ii) of Section 3. 

(ii) The two-sample problem. The problem of deciding whether two unknown 
cumulative distribution functions F and G are equal, or in the contrary case of 
determining for each u whether G(u) is <, =, or > F(u), may be generated by 





MULTIPLE DECISION PROBLEMS I 13 


the hypotheses H,(u):G(u) 2 F(u) and H_(u):G(u) S F(u). Let X,, ---, Xn, 
and Y,,--- , Y, be samples from F and G, and denote by X(u) and Y(u) the 
number of observations in these samples that are Su. The appropriate tests of 
H.(u) and H_(u) are the standard one-sided tests for equality of two binomial 
distributions, and with a,(u) = a_(u) = a < $ they are clearly compatible. 
This choice of levels, as in the previous case, puts the weight of the decision in the 
tails on an irrelevant random experiment. But this is less serious in the present 
problem since the very small value of a that is required for satisfactory error 
control implies that one will only rarely reject the hypotheses in the tails, where 
X(u) = Y(u) = Oor X(u) = m, Y(u) = n. 

(iii) Estimating a cumulative distribution function. Let X,, --- , Xn be inde- 
pendently distributed with cumulative distribution function F, and consider the 
hypotheses H(uo , po): F (uw) S po. As in Section 4(i), if F* denotes the true 
c.d.f., the hypotheses H(uo , po) for which F*(uo) S po are true, and those with 


F*(uo) > po are false. The associated intersection (2.1) is therefore 


(5.3) a w(t, Po) Nn w (to, Po). 
{(4o.Po): Pez F*(ug)} {(%9, Po): Po<P*(ug)} 

Since w(t , po) is the set of all F for which F(u) S po, the first and second 

member of (5.3) are the sets of all F satisfying F(u) <= F*(u) for all uand F(u) = 

F*(u) for all u, respectively. The set (5.3) therefore contains as its only element 

the c.d.f. F*. 

It is seen that for each fixed wu , if we set 6 = F(uo), we are dealing with the 
problem of Section 4(i), so that in particular the family of sign tests of the hy- 
potheses H(uo , p) based on the binomial variable X(u) leads to the estimate 
6 = F(uo) derived there. However, for compatibility one must now add the re- 
quirement that as u varies the F(u) should constitute a c.d.f. This condition is 
violated in a rather trivial way if one puts a(u, p) = a. For when X(u) is 0 or 
n, the estimate F(u) is not 0 and 1 but only close to these values. One can achieve 
compatibility by putting a(u, p) = a(u), and letting a(u) tend to 0 as u tends to 
«©, and 1 as u tends to — ~. Since it is enough to make this change in the extreme 
tails, it need not affect the result in practice. 


6. Restricted products of decision problems. The method, described in Section 2, 
of generating a multiple decision problem from a set of hypotheses is a special case 
of the following process. Consider the definition of a general decision problem in 
terms of a family of distributions @ = {Ps,, @ ¢ 2}, a space of possible decisions 
D = {d} and a loss function W(6, d). Suppose that @ is fixed but that two dif- 
ferent decision spaces D’, D” with the loss functions W’, W” are of interest. From 
the two associated decision problems one can form a new problem, which consists 
in the simultaneous consideration of the two given ones, and may be termed their 
product. Its decision space is the Cartesian product 2’ X D” and the loss re- 
sulting from the decision d = (d’, d”) is 


(6.1) W(0, d) = W'(6, a’) + W"(6, a”), 





14 E. L. LEHMANN 


or slightly more generally W(@, d) = pW’(@, d’) + (1 — p)W”(6, d”). A typical 
illustration is the first example of Section 1, where the component problems are 
concerned with the classification of £, respectively 7», as negative, zero, or posi- 
tive, and where the product problem is that of simultaneously classifying both 
parameters. 

This concept is however not general enough to cover most of the other problems 
considered in the previous sections. Consider for example the two hypotheses 
H’:6 S 6 and H”:@ = 6. If we denote the decisions to accept H’ and H” by 
do and do and the decisions to reject by d; and dj , the product problem offers 
the choice of the four decisions (do, do), (do , d1), (di , do), (di , d?). Of these the 
first corresponds to the parameter point @ = 4% , the second and third to the sets 
6 < @ and 6 > 6, while the last one combines two inconsistent decisions and 
hence corresponds to an empty set in the parameter space. In order to obtain the 
problem of choosing only between the first three of these possibilities, one must 
eliminate the point (d’, d”) from the decision space D. in general, we shall speak 
of a restricted product if in a product problem some of the decision pairs (d’, d”) 
are omitted from D’ X D”, so that D is a subset of D’ X D”. Given any pro- 
cedures 6’, 6” for the problems with decision spaces D’ and D”, let 6 = (8’, 6”) be 
the procedure that takes decision (d’, d”) when 5’ = d’ and 6” = d”. In con- 
formance with our earlier terminology we shall say that the pair (4’, 6”) is com- 
patible with the given set of restrictions if 


Po{ (8'(X), 6’(X)) « D} = 1 for all 6, 


that is, if the probability is zero of the procedure (4’, 5”) leading to one of the 
forbidden elements of D’ K D”. Under suitable measurability conditions there 
is then again a 1:1 correspondence between compatible pairs of decision pro- 
cedures for the component problems and decision procedures for the restricted 
product problem. The proof is exactly as that of Theorem 1. 


j f(z, 6;) P f(z, 6_1) oe we [aT (ey a 
sie f(x, | kag ~ Lr@) Lr@y J =" 


Hence, for each m and all «‘” 


; . | Pra(z”) D-im(x”) 
(4.13) min | ————., — ———= | Sh 
Pom(x'™) Pom(x ™)) 


where « = min [1/2p, 1/2(1 — p)]. Equation (4.10) follows at once from equa- 
tion (4.13). This completes the proof of Lemma 4.2. 

We now prove some consequences of Assumptions A, B, and C. In what 
follows ¢° is a fixed a priori probability measure all of whose components are 
positive. The reader should recall the italicized statement a few lines above 

(0) (0) 


° a 0) , (0) (0 (0) 0) 
Assumption A. Write D™ = &°)/&’, where &” = (& 1, & , &.). 


CORRECTION 
THE ANNALS OF MATHEMATICAL STATISTICS 
Volume 28, No. 1 March 1957 


Page 14, formula (4.12) through page 17, line 23 should be exchanged with page 70, line 8 
through page 72, next-to-last line. 





MULTIPLE DECISION PROBLEMS I 


Lemma 4.3. Under Assumption A there exist positive constants b,, and ay, (m = 
0, 1, 2, ---) with b,, S a,, and such that 


(4.14) ge 2 af and only if Pin(t”) / p-im(z’™) = an, 


(4.15) €™ eC, ifandonlyif — pim(z“”) / pin(z’”) S ba, 


with strict inequality holding if and only if §‘” is an interior point of the appropriate 
C;. 

(Of course, the values a» , b» depend on —’.) 

Proor: The method of proof is similar to that of Theorem 3. Let 2°” and 
y’” be such that (4.3) holds, and let ¢°” (2°), €°” (y™) be the a posteriori proba- 
bility measures corresponding to observed values z‘”, y‘”. Equation (4.15) 
will follow if we can show that ¢°(z2°”) ¢« C_, implies ¢°”’(y”) © C_1. (The 
reader will be aided in what follows if he draws a picture.) Now, (4.3) says that 
the line Vot'”’(y‘”’) lies toward V_, from (or on) the line Vet” (z‘”). More- 
over, (4.3) implies (4.4) and (4.5), which say that the line V,¢°" (y””) lies toward 
V_, from (or on) the line V,¢°”(z"”) and that the line V_,¢‘”’(y°”’) lies toward 
V. from (or on) the line V_,¢°”(2°”). Hence, ¢”(y‘”) lies inside or on the 
triangle 7 whose vertices are V_,, ¢°”(z‘”), and the intercept of V,¢°” (2°) 
with V_,Vo. Since 7 is contained in the triangle Vot”(2“)V_1 which (by 
convexity) is contained in C_, , (4.15) is proved. Moreover, since the last part 
of Assumption A implies that ¢°”(y‘”) could lie on the line V_,é‘” (x) only if 
em (y™) is Vy or E(x”), it is clear that ¢°”(y"”) is a boundary point of 
C_, if and only if either ¢“”(y°”) = V_, (see the italicized remark a few lines 
above Assumption A in this case) or else ¢“”(2”) is a boundary point of C_, 
and ¢°"(y™) = ¢°"(2"”); the latter implies equality in (4.3). Thus (modulo the 
italicized remark), defining b% to be (for fixed ¢) the supremum (over x”) of 
those values pin(z'”) / p-im(x’”) for which ¢”(2°”) ¢ C_y, and taking b,, = 
bx if ¢°" (x) # V_, is on the boundary of C_, for some x” and ba < bn < 
infimum of those pim(x°”) / poim(a”) for which ¢”(2°”) ¢C_, otherwise, 
we see from the previous sentence and the fact that pin(y°”) / p-im(y’”) < 
Pim(a”) / porm(x”) if EM (a) - E™(y™) © T, that the last part of the 
lemma as it applies to (4.15) is proved. Equation (4.14) (and the corresponding 
last part) is proved similarly. 

Lemma 4.4. Under Assumptions A and B, there exist constants b, < D® < a, 
of Lemma 4.3 satisfying bm S bm4i, Gm 2 Amar, form = 0, 1, 2, --- 

Proor: We shall prove the assertion regarding the b,, , a similar proof applying 
for the a,,. Keeping & fixed as before, in order to prove b» S< bm4: it clearly 
suffices to prove that £°” ¢ C_y and pr,mys(t"”) / perma (ee?) Sima") / 
p-i,m(x'™) imply that &"*» ¢ C_, (the case where either ratio is 0/0 or where 
both are 0 is easily disposed of); i.e., that &™ ¢ C_; and fi(tma1) S fa(@m41) im- 
ply "+ ¢ C_,. The last inequality says that the line Vot"*” lies toward V_, 
from (or on) the line Vot™ ; by (4.8) and (4.9), it implies that the line V_,é°"*» 
lies toward V° from (or on) the line V_,é‘™. Thus, é'"*” lies in the triangle 





16 E. L. LEHMANN 


Voe(™V_, and hence, by convexity, &"*+» ¢« C_,. The remaining part of the 
lemma follows at once from the fact that ¢{" < &°7 is equivalent to pim / 
P-im = D®, 

If fi(z) / f-s(x) cannot take on a suitably dense set of values, the a,, and b, 
might (for fixed ¢”) not be unique and might correspond to ¢‘” in the interior 
of the C; or the complement of C_,; U C,. If this is not the case, the previous 
paragraph and the fact that VoPV_, ¢ C_, show that we can strengthen the 
weak inequality of Lemma 4.4. One possible formulation of this result is the 
following: 

Lemma 4.5. If bn < D® (resp., am > D®) and if for every open interval J con- 
taining bm (resp., m) the ratio Pim(z’”) / p_r,m(x°”) takes on values in J — {bm} 
(resp., J — {am}) with positive probability under H, and H_, (so that bm, Gm are 
unique), and if for every open interval J’ containing D® as a left (resp., right) 
end-point fi(x) / f(x) takes on values in J’ with positive probability under Hy, 
and H_,, then bm < Dm4a(resp., Gm>Om41). In particular, in case (3) of Lemma 
4.2, if uw is equivalent to Lebesgue measure (or if Lebesgue measure is absolutely 
continuous with respect to u) on the real line, the a,, and b,, are unique and this last 
result holds. 

In fact, it remains only to prove the last assertion of the lemma, which follows 
at once from the fact that e“'*-"* takes on values in any interval of positive 
numbers with positive probability under H, and H_,, if Lebesgue measure is 
absolutely continuous with respect to yz. 

Lemma 4.6. Under Assumption C, for any &° all of whose components are (or in 


r (0) 


fact, for which &° is) positive, there is an integer N = N(#) such that every 
Bayes solution with respect to &° requires fewer than N observations with proba- 
bility one under fi , f-1, and fo. 

Proor: Fix ¢”. Since P is a positive distance from Vo, there is clearly a 
positive number c such that every Bayes solution must stop with probability 


. ( ( ) , ( 
one whenever either pin(z"”) / pom(z’”) < c¢ or else pim(x'”) / pom(x'”) < c. 


The desired result now follows at once from (4.10). (Note again the remark 
made in italics just before Assumption A). 

We may now-summarize our results: 

THEOREM 4. Under Assumptions A, B, and C (in particular, under (1), (2), 
or (3) of Lemma 4.2), any procedure which minimizes Ao(5) subject to (4.1) is a 
GSPRT of H, against H_, with bm S bmi S D© S Anis S Gm for m = 0,1, 2, 


-++,N and some D, which stops with probability one under f;(i = 0, +1) after 
N or fewer observations. Under additional conditions specified in Lemma 4.5 the 
values dm , bm(m = 1) will be unique and am > Amy1 OF bm < Bias unless am = D© 
or b», = D®, where D corresponds to the a priori distribution with respect to which 
the optimum procedure is Bayes. 

REMARKS, GENERALIZATIONS, Etc. 

1. Of course, a GSPRT of Theorem 4 involves a randomization rule for all 
m < N, including a possibly randomized starting rule (m = 0) if a or bb) =D”. 
If uw is nonatomic, there will clearly exist an optimum GSPRT involving no 
randomization, except possibly in the starting rule. The lack of uniqueness of 





MULTIPLE DECISION PROBLEMS I 17 


the a,, and b,, in cases not covered by Lemma 4.5 is of course inessential, re- 
flecting only that certain intervals of values of pim/p-im have probability zero 
under all f; . 

2. In all of the above, the X; are random elements whose range is immaterial 
(not necessarily real) as long as the appropriate assumptions are satisfied. To 
conserve space we have not included statements about the obvious sets of 
measure zero where various conditions may be permitted to fail. 

3. As an example of what can happen when our assumptions are not satisfied, 
we mention briefly the following example: Suppose f(z) = 1 / x[l + (x — j)’], 
j = 0, +1, uw = Lebesgue measure. In this Cauchy case it is easy to see that, 
for ¢ with all components positive, the set of possible ¢° values is a simple 
closed curve minus the point ¢°,, and lies entirely in the interior of the iriangle 
VoV,V_, . Assumption A is not satisfied, and there is no reason why the result of 
Lemma 4.3. should be valid. Also, since (e.g.) fi(V3/2f ( -V 372) does not 
depend on j, there is no reason why the result of Lemma 4.6 should hold here. 

4. Remarks analogous to those of Section 2 can be made here: for concave 
(resp., convex) nondecreasing c(n), the minimization of a linear combination 
such as (4.2) with Ao(4) replaced by Eoc(n) under 6 may be compared in an 
obvious fashion to the minimization when c(n) is replaced by the linear homo- 
geneous cost function ca(n) passing through (1, c(m + 1) — c(m)) (i.e., to the 
solution of the problem we have considered): the stopping region will now change 
with m, being contained in (resp., containing) that fixed region for the problem 
concerning c»(n). 

To prove the procedures unbiased, we note that unbiasedness of a two-deci- 
sion procedure ¢ with losses a and b, by (6.3) is equivalent to 


O6€w 
=f 
OGew ’ 


<= 
b/(a +b) for 


z 


(7.4) '  Ege(X) 


that is, to the Neyman-Pearson condition of unbiasedness at the level 
(7.5) a = b/(a +b). 


The result now follows from (iv) of Section 6 since in all of the examples the pro- 
cedures were obtained as products of unbiased tests. 

Unfortunately, as has already been pointed out, it is in general not true that un- 
biasedness of a product implies the same property for the component problems. 
Suppose however that for every test of H:6 € w, the power function Eyg(X) is a 
continuous function of 6. Then unbiasedness of ¢ entails the somewhat weaker 
condition of similarity on the boundary, namely 


(7.6) Ese(X) = a for 6eA, 


where A is the common boundary of w and w ’. For an important class of testing 
problems, there exists not only among all unbiased tests but also among the larger 
class of tests satisfying (7.6), one that uniformly maximizes E,g(X) for @ ¢ w 
and uniformly minimizes it for @ ¢ w. This test therefore, among all those that 
are similar on the boundary, uniformly minimizes the risk (7.1). As was shown in 


CORRECTION 


THE ANNALS OF MATHEMATICAL STATISTICS 
Volume 28, No. 1 March 1957 


Page 14, formula (4.12) through page 17, line 23 should be exchanged with page 70, line 8 
through page 72, next-to-last line. 





18 E. L. LEHMANN 


[11], this is the case in particular when 6 = (@,, --- , 6,), the distributions of 2 
form an exponential family and w is of the form 4 S 6{ or 6 = 6. 

The desired optimum property of the various procedures discussed in the ear- 
lier sections, for finite T is an easy consequence of the above remarks and the 
following theorem. 

THEOREM 2. Let {H,:6 € w,, y € T} be a finite family of hypotheses, and suppose 
that for each y the test ¢°, uniformly minimizes the risk among all tests that are similar 
on the boundary at level a, = b,/(ay + by), and that the family {¢,y eT} is 
compatible. Suppose further that the following structural assumption is satisfied. 

(*) For every yo ¢ T and each common boundary point 6 of w,, and w,, there exist 

intersection sets 2; and Q; of the form (2.1) such that 6 is also a common boundary 

point of Q; and Q; , and such that x;, = xj, fer ally # yo but that x;,, ~ Xj. 
Then if Esg,(X) is continuous in @ for each y, the product procedure y° given by 
vi = IT) 
yer 
is unbiased, and uniformly minimizes the risk among all unbiased decision pro- 
cedures of the restricted product problem, the components of which are the problems of 
testing H, with losses a, and b, . 

Proor. Let © be the class of all decision procedures the component tests of 
which are similar on the boundary at level a, . It follows from the assumptions 
made and from (iii) of Section 6 that the procedure y” uniformly minimizes the 
risk within @. Since the tests ¢5 are unbiased—as is seen by comparison with 
the tests ¢,(x) = a,—the same is true of y’. Let @y denote the class of all unbiased 
procedures of the product problem. We shall now show that @) C @, which will 
complete the proof. 

Let y be any procedure belonging to Co , let yo be any element of T and 4 any 
boundary point of w,, and w,, . Let 2; and Q; be the sets, the existence of which is 
guaranteed by (*), and assume without loss of generality that z;,, = 1. Then 
unbiasedness of ¥ implies that for any @ in Q; , 


Eo doy (zig + Law (X) — (tix — 1)b~7(X)] 
< Ey Dd HM (xjn + law (X) — (xi — 1)bws'(X)], 
where the ¢, are the component tests of y. Since z;, = 2;,fory # yo, Xi, = 1, 
Ljvo = —1, this reduces to 
Ay,Le¢y(X) S b,,Edl — ¢,,(X)}. 


Analogously the opposite inequality is seen to hold for any @ in 2; . Because of the 
continuity of Es¢g,,(X) it follows that equality must hold on the boundary of 
Q; and Q; , and hence in particular for 6 = 6). Thus 


. 3 b 
Rite tn ei 
a,, + 5,, 


° ° ° an 
for every boundary point of w,, and w,, , as was to be proved. 





MULTIPLE DECISION PROBLEMS I 19 


The assumptions of this theorem may be weakened slightly at one point, which 
is important for applications. If A, is the common boundary of w, and w,’, it is 
not necessary for (*) to be satisfied at every point of A, but enough if it holds for 
the points of a dense subset A’ . The proof then shows that E,g,(X) = a, for 
all @ « A°. , and by continuity the equality holds as before for all points of A, . 

The remaining conditions being automatically satisfied for any problem that 
is generated by one-sided hypotheses concerning one of the parameters in an ex- 
ponential family, it is necessary only to verify (*) in order to prove the desired 
optimum property for the various examples of Section 3. This requires no further 
reference to the possible distributions of the observable random variables, since 
(*) concerns only the structure of the multiple decision problem, that is, of the 
sets 2; , not the distributions that are represented by the points @ ¢ Q. 

In the first example of Section 3, the only boundary point of w; and w;* (i = 
1, 2) is % . But this is also the boundary point of % = ww;and Q; = w;'w; (i ¥ j), 
and hence (#) is satisfied. The result here is slightly stronger than the one given 
by the author in [8] since the condition of unbiasedness can be seen to be less 
stringent than the restriction imposed on the procedure in [8]. The checking of 
(*) is exactly analogous in the second version of this example, in which is re- 
placed by the interval &, < 6 S &. 

In example (ii), the common boundary points of w; and w;' for example, are 
the points with —§ = & . Let (&, 7) be any such point, and suppose without loss 
of generality that 7 < 0. Then (&, 7) is also a boundary point of Q; = ww. and 
2; = w; w , as was to be proved. The other cases are verified analogously. 

In example (iii) where w;; is the parameter set 6; S 6; + A, consider the com- 
mon boundary of, say, w2 and w: . By the remark following Theorem 2, we may 
restrict attention to points of this boundary satisfying 


0:,< °° <0, <hHAtAK<8;,< +++ <6; , 


and we may assume further that all of the differences 6. — 6;, (¢ = 1, --- , 
are <A. Let 


f soo 0 i 
a = (6; , eee 6,) é we N wa Nw}? 


be any specific such point, and consider the points (6{ , 6, --- , 62) with 0 < 
6: — 6 < «. If ¢ is sufticiently small the relationship between all pairs of co- 
ordinates will be the same as before, except thet @ < instead of = 6, + A. 
These points are therefore in the intersection. 


—1 2ij 
®Wi2 N We nNw;i', 


and since 6 is a boundary point of this set as well as of the intersection differing 
from the present one only in the factor wy , (*) is verified. 


8. Optimality of the procedures of Sections 4 and 5. A basic assumption of 
Theorem 2 is the finiteness of the set I’, and the theorem is therefore not applica- 
ble to any of the problems of Sections 4 and 5. Since the assumption was used 





20 E. L. LEHMANN 


however only in the proof of the relationship 
(8.1) Co Ce, 


it wil! be enough in the following to prove (8.1) in each case. This means showing 
for these problems that unbiasedness of a decision procedure implies that all of 
its component tests are similar on the boundary. 

(i) Estimation. For the problem of estimating a real-valued parameter @, 
let the risk function of an estimate Y, in accordance with Section 4, be given by 


oo 6 
a| (y — 6) dPo(y) + b | (0 — y) dPoy), 
6 a) 


where P, denotes the probability distribution of Y. The condition of unbiased- 
ness then becomes 


« 6 
af (0 4PAy) +b [ © — uw) aPAy) 
“6 20 
(8.2) is e 
saf (y—#) dry) +b[ @ — uv) aPaly) 
6’ 00 
for all 0, 6’. In the case that a = b, this states that the estimate, on the average 
is closer to the true value 6 than to any other value 6’. 


In the following we shall restrict attention to estimates Y with finite risk. 
If @ < 6’, (8.2) then reduces to 


af [ w- 9 aPa) +0 -0 [ araw)| 


6’ 6 
b | ( — y) dP.(y) + @ — 4) | aay) | 
6 —ep aw 


Dividing both sides by 6’ — 86, and letting 6’ tend to 6, we see that 


-6’ 
0s5+,/ W- 0 arly) s Pla <¥ 50) 0, 
—_ 6 
and that similarly also the first term on the right-hand side tends to zero. In 
the limit we therefore get 


aP,{Y > 6} = bP. Y = 


b 
a+b 
By letting 6’ tend to 6 from below, we find analogously that unbiasedness of the 
estimate Y implies 


PY > 6} < 


PY = 6} 2a. 





MULTIPLE DECISION PROBLEMS I 21 


Suppose now that the distributions of 2 constitute an exponential family, 
and that they possess densities with respect to Lebesgue measure. This is the 
case not only for families of univariate and multivariate normal distributions, 
of gamma distributions, etc., but also when one is dealing with binomial or Pois- 
son variables to which, as a randomization device, one adds a variable that is 
uniformly distributed over (0, 1). If then Ps{ Y = c} is positive for some @, it is 
positive for all 6. Hence 


(8.3) Ps{Y > 0} = a, 


except possibly for a countable set of parameter values, and it follows from a 
theorem of Scheffé [13] that (8.3) must hold for all @. Since it was shown in Sec- 
tion 4 that Y > 4 is the rejection region of the hypothesis H(@)):@ S 4 , and 
since % is the only common boundary point of w(@) and w (4), this completes 
the proof of (8.1). 

(ii) Estimation after a preliminary test of significance. For the problem gener- 
ated by the hypotheses H(6,):6 S 6,(@ < 6), and with constant losses a and b, 
the risk is 


«eo 6 66 
‘ / (y — 6) dPy(y) +b | (6 — y) dPaly) + 06 — &) dPyy) if & <6, 
6+ 60+ —* 


and 


a| (y — 6) dPe(y) if 6 < @. 
bo+ 


It is seen exactly as before that unbiasedness implies (8.3) for all @ > % , and 
that this holds also for @ = 6 follows again from Scheffé’s theorem. 

Consider instead the modified problem of Section 4 (ii) generated by the hy- 
potheses H(6,):@ S 6, with 6, = 0 or 6, 2 C, and with losses ap, bo for H(O) 
and a, b for the remaining hypotheses. A compatible procedure accepts H(0) 
when a statistic Y < C, and otherwise takes Y as an estimate of @. The asso- 
ciated risk is 


p [aly — C) + pao] dPo(y) if 6 


bop | dP»(y) + a| (y — C) dP¢(y) if0 <@ <= 
— 00 C+ 


[ @-») dP) 


“C 


a| (y — 6) dP&y) +b 
6+ 


aC 
+ | [o(@ — C) + bop] dPy(y) if C< 8. 


Here we have taken the measure yu of (2.4) to be Lebesgue measure for the hy- 
potheses H(é@,) with 6, 2 C and to assign measure p to the hypothesis 1/(0). 





22 E. L. LEHMANN 
For 6 = C we see as before that unbiasedness implies (8.3), while for @ = 0 we 
find 
aP{¥ > C} = boPof Y Ss C} 
and hence 
PY > C} = av 


by considering the condition of unbiasedness for 6 = 0 and @ | 0 and vice versa. 

(iii) Testing for goodness of fit. The problem, as described in Section 5, is gen- 
erated by the hypotheses H_(u):F(u) S Fo(u) and H,(u):F(u) = Fo(u). Let 
d_(u), do(u), d,(u) denote the decisions that the true F(u) is <, =, > than 
Fo(u) and let y_(x, u), po(x, u), ¥4(x, u) be the probabilities with which these 
decisions are taken when x = (2z,, --- , x.) is observed. Then the corresponding 
over-all probabilities of these decisions are P_(u) = P_(u, F) = Ep_(X, wu), 
Po(u) and P,(u). If the losses resulting from false rejection and acceptance are 
a(u) and b(u) for both H_(u) and H,(u), and if we put 


R_(u, F) = R_(u) = fa(u) + b(u)|Ps(u) + b(u)Po(u), 
R,(u, F) = Rou) = a(u)[P_(u) + P(u)], 
R,(u, F) = R(u) = [a(u) + b(u)|P_(u) + b(u)Po(u), 


the risk function is 


R(F) = [| R(w) dulu) +/ Ro(u) du(u) + | R,(u) du(u), 
89 8+ 


sS— 
where S_, So and S, are the sets on which the true F(u) is <, =, and > than 
F,(u). In particular 


R(F5) = | Ro(u, Fo) dp(u). 


We shall assume in the following that R(Fo) is finite, that a(u) and b(u) are 
bounded in every finite interval, and for the sake of convenience also that F, 
possesses a probability density. 

The condition of unbiasedness becomes in tie present case 


[ R_(u, F) du(u) + 
8_ 


[ Ro(u, F) du(u) + [ Ri(u, F)du(u) 
“84 


18 


< [. R_(u, F) du(u) + [. Ro(u, F) du(u) + | R,(u, F) du(u), 
o.. 85 8, 


where S_, So and S‘, are the sets on which some alternative c.d.f. F’ is <, =, 
and > than F,. Consider this condition now for some F and F’, both of which 
agree with Fy except on a finite interval J on which F < Fy and F’ > F,. It 





MULTIPLE DECISION PROBLEMS I 


then reduces to 
| Ru, F) duu) s [ R,(u, F) du(u). 
I “I 


If, holding J fixed, one considers a sequence of such distributions F, which pos- 
sess probability densities that tend to the density of Fo , it follows from Scheffé’s 
theorem that 


[ R(w, Fe) du(u) < | Ryu, Fo) dul), 
v7 I 


and since 7 was an arbitrary interval, that 
R_(u, Fo) S&S R+(u, Fo) 
Analogously one sees that the reverse inequality must hold, so that 


R_(u, Fo) = R4(u, Fo) 


If in the above argument F is replaced by Fo, one finds 


[ Row, Fo) du(u) Ss [ Re, Fo) du(u) 
I I 


for all 7, and hence 


Ro(u, Fo) S R(u, Fo) 


Similarly, on replacing F’ by Fo one gets 


[ Ru, F) du(u) Ss | Ru, F) dp(u) 
I I 


for all J, which by the same argument as before leads to 


Ri(u, Fo) Ss Rou, F,). 


Thus, for almost all wu, 


R_(u, Fo) = Ro(u, Fo) = R(u, Fo), 
which implies 


b(u) 


ra , Fo) = ? 0) => —_—_____—__ 
is Ps(u, Fo a(u) + b(u) 


= a(u), 
as was to be proved. 

A very similar proof applies in the two-sample problem discussed in Section 
5(ii), and we shall therefore not give the details. 

(iv) Estimating a cumulative distribution function. The problem of estimating a 
c.d.f. was treated from a minimax point of view by Agarwal [1] for several loss 
functions al] of which differ from the one below. Following our earlier definitions, 





E. L. LEHMA 


we take the risk function here to be 


x ;m oFiu) 

[ | a(u)[y —- Fa)] dPuGa + | b(w{[F(u) — y] dP’y(v)> du(u), 
vo (/F(u) —o } 

where P,, = P,» is the listrilv1tion of the estimate Y(u) of F(u). As in the prob- 
lem of estimating a single parameter, we shall restrict atte tion to estimates 
with finite risk, and the condit):». of unbiasedness then becomes 


a \ 


wo / F’(u / 
[ < a(u) [ [y — F(u)] dP,(y) + [F’(u) — F(u)] i dP.(y)> du(u) 
—o | J F(u Jr 


u) 


~ F’(u) o Flu \ 
[jo l (F’(:) — yl dPuty) + (Fu) — Fw] | dPa(y)} du(u), 
om Sr. 7 

where the probabilities are computed with respect to F, and where F’ is any 
alternative c.d.f. 

We shall consider first the case that F is the uniform distribution on (0, 1), 
that J = (uw, %) is any subinterval of (0, 1), and that F’ is any continuous 
e.d.f. such that F’(u) = F(u) + A for uw <u < wm and F’(u) = F(u) for 
u <U — Aand v > uw + A. On dividing by A, and letting A tend to zero, 
(8.4) is then scen to reduce to 


[ awp.s Y(u) > F(u)} du(u) s [ oeP.tvoo S F(u)} du(u), 
I I 


and since this holds for all J, to 


g ; b(u) 
> f ) "(4 on a 
P.{Y(u) > Fw} s ws blu) 


= a(u) 


On letting F’ tend to F from below one finds similarly 
P,{Y(u) = F(u)} = a(n) a.e. 


In exactly the same manner these two inequalities are seen to hold also for any F 
belonging to the family $ of mixtures of uniform distribution over nonoverlap- 
ping intervals. By considering a countable dense subset of $, for example mix- 
tures with rational weights of uniform distributions over intervals with rational 
endpoints, it is seen that there exists a null set N such that for any uz N the 
two inequalities hold for all Fe $, and it follows by an argument similar to that 
given in (i) of this section that for all u zg N, and all F eF 


P.r{Y(u) > F(u)} = au). 


For F eS and any fixed ugN, the common boundary points of w(u, po): 
F(uo) S po and w ‘(us , po) are exactly the distributions of ¥ for which F(%) = 
po . For these we then have 


Py e{ VY (uo) > po} = a(uo), 





MULTIPLE DECISION PROBLEMS I 25 


and since the left-hand side is the probability of rejecting the hypothesis H(w , 


Po); 


this completes the proof of (8.1). The desired optimum property of the pro- 


cedure now follows from the fact, proved by Fraser [4], [5], that for the family 
of distributions $, the one-sided sign test uniformly minimizes the probabilities 
of error among all unbiased tests of the hypothesis H(uo , po). 


(1) 
[2 
(3) 
(4) 
(5) 
(6) 


(7) 


REFERENCES 

Om P. AcGarwat, “Some minimax invariant procedures for estimating a cumulative 
distribution function,’’ Ann. Math. Stat., Vol. 26 (1955), pp. 450-463. 

R. C. Bose anp 8. N. Roy, “Simultaneous confidence interval estimation,’’ Ann. Math. 
Stat., Vol. 24 (1953), pp. 513-536. 

Davip B. Duncan, ‘Multiple range and multiple F-tests,’’ Biometrics, Vol. 11 (1955), 
pp. 1-42. 

D. A. 8S. Fraser, ‘‘Completeness of order statistics,’ Canadian J. Math., Vol. 6 (1953), 
pp. 42-45. 

D. A. 8. Fraser, ‘‘Non-parametric theory: Scale and location parameters,’’ Canadian 
J. Math., Vol. 6 (1953), pp. 46-68. 

H. O. Hartvey, ‘“Maximum F ratio as a short-cut test for heterogeneity of variance,”’ 
Biometrika, Vol. 37 (1950), pp. 308-312. 

M. G. KenpaL.., “Further contributions to the theory of paired comparisons,”’’ Bio- 
metrics, Vol. 11 (1955), pp. 43-62. 


| E. L. Lexnmann, “Some principles of the theory of testing hypothesis,’’ Ann. Math. 


[13] 


(14) 


Stat., Vol. 21 (1950), pp. 1-26. 

E. L. Leumann, ‘‘A general concept of unbiasedness,’’ Ann. Math. Stat., Vol. 22 (1951), 
pp. 587-592. 

E. L. Lenmann, “Testing multiparameter hypotheses,” Ann. Math. Stat., Vol. 23 
(1952), pp. 541-552. 

E. L. LeEHMANN AND Henry Scuerrf, ‘‘Completeness, similar regions, and unbiased 
estimation, Part II,’’ Sankhyd, Vol. 15 (1955), pp. 219-236. 

S. N. Roy, ‘On a heuristic method of test construction and its use in multivariate 
analysis,’’ Ann. Math. Stat., Vol. 24 (1953), pp. 220-238. 

Henry Scuerr®, ‘‘A useful convergence theorem for probability distributions,’’ Ann. 
Math. Stat., Vol. 18 (1947), pp. 434-438. 

J. W. Tukey, “‘Quick and dirty methods in statistics, Part II, Simple analyses for 
standard designs,’’ Proc. Fifth Annual Convention, Am. Soc. for Quality Control 
(1951), pp. 189-197. 





ESTIMATES FOR GLOBAL CENTRAL LIMIT THEOREMS: 


By Rautepn PALMER AGNEW 
Cornell University 
1. Introduction. Let & , &, --- be independent random variables having the 
same df. (distribution function) F(x). Thus, for each k = 1, 2, 3, 
(1.1) Pri & xz} = F(z), —o <r < @, 
where F(x) is a real monotone increasing function for which F(— ©) = 0 and 
F(o) = 1. Let o(t), defined by 


x 
(1.2) o(t) = | e'* dF (x), 
denote the c.f. (characteristic function) of F(x). We suppose that 


(1.3) Pi x dF(zx) = 0, f. x dF(x) = 


so that F(x) has mean O and standard deviation 1. 
The d.f. F(x) and the c.f. 6,(x) of the sum & ++ --- + &, are such that 
n(x) = [(x)]" and hence 


(1.4) [o(t)]" - | e™ a (2), 


The d.f. of the combination 
t. ina g 
(1.5) =I +& A . 7 = 
nil? 
is then F,,(n'/*x) and we denote this by F,,(x). It’s c.f. is [o(n*"t 
. 2 
(1.51) [o(n*"2)]" = / e'* dF,,(x). 


The hypotheses (1.3) imply that the formulas (1.3) hold when F(x) is replaced 
by F,(x). A special case of the central limit theorem asserts that, for each indi- 
vidual zx in the interval —» <2 < o™, 

(1.6) lim F,(x) = (zx) 


nwo 


where ®(x) is the Gaussian d.f. defined by 


(1.61) @(r) = sa | etl? du. 


(Qe 


For an exposition of the above facts, see Cramér [2]. 


Received January 17, 1955; revised August 20, 1956. 
1 This research was supported by the United States Air Force under Contract No 
(600)-685 monitored by the Office of Scientific Research. 


26 





GLOBAL CENTRAL LIMIT THEOREMS 
It was recently shown by the author [1] that if p > 3, then 


(1.7) lim | | Fa(x) — O(x) |? dx = 9. 
n~oe oe 

For each p > 3, (1.7) is a global version of the central limit theorem which com- 

plements the local version (1.6) in which values of z are considered one at a time. 

In fact, it was shown in [1] that it is possible to pass from one to the other of 

(1.6) and (1.7) by applications of theorems on convergence of sequences of d.f.’s. 

However, {1} did not provide a means of calculating the numbers C%”’ defined by 


1.8) cy = [| F(z) — o@) P ae 


and of determining the rapidity of the convergence to 0 of C” as n— ~. 

Section 2 gives optimal inequalities satisfied by d.f.’s having mean 0 and 
standard deviation 1. Section 3 gives a formula for the constants C%”’ for the 
interesting case in which p = 2. Section 4 shows that if F(x) is the symmetric 
binomial d.f., then 


(1.9) _* + ta + 0(5). 


n Or n* 


Section 5 shows that if F(x) is the d.f. of a random variable uniformly distributed 
over the interval —3'? < x < 3", then Cy’ converges to 0 much more rapidly 
because in this case 


] 3 ] 
9 ‘be ( (+). 
ate n? 12807r'/? v9 n® 


Finally, inequalities are given for appraisal of the constents in (1.91) when n 
is fixed and not necessarily large. 


2. Some optimal inequalities. In order to obtain the formula for CS” given in 
Section 3 we need estimates of differences of d.f.’s satisfying (1.3). While esti- 
mates given in [1] would serve our purpose, it is of interest to know the best esti- 
mates and we proceed to derive them. We start with the following known 
theorem. 

THEOREM 2.1. If F(x) is a d.f. for which 


(2.11) 2 dF(2) = 0: | 2’ dF(x) = 1. 


then 


0 Ss F(z) 





28 RALPH PALMER AGNEW 


Moreover the function (1 + x°)™ is the least function such that (2.12) and (2.13) 
hold whenever F(x) is a d.f. satisfying (2.11). 

This theorem gives a special case of a Tchebycheff inequality for bounds of 
d.f.’s having prescribed moments; for a recent treatment of the subject and for 
references to literature, see Royden [3]. An unmotivated proof of the theorem 
can be given in a few lines as follows: Let x» be a positive value of z for which 
F(x) is continuous and let yo = F(x). For each constant ¢ for which c S 0 we 
obtain, with the aid of (2.11), 


o< [ ” = 0) dF(2) 


=j]— a dF(x) + 2c | x dF(x) + é| | dF (x) 
z zo - 


0 3 
S 1 — x(1 — yo) + 2cxo(l — yo) — cy. 


Clearly yo > 0, because if y = 0, then F(x) = 0 when x S 0, and (2.11) is 
violated. Hence we can put 


(2.15) c = —2o(1 — yo)/Yyo 


in (2.14), and find that y = 1—-— 1/(1 + xo)’. Thus (2.13) holds wherever 
F(x) is positive and continuous and hence wherever x 2 0. To prove (2.12), 
we apply (2.13) to the df. {1 — F(—z)]. The last part of the theorem follows 
from the fact that if F(x) = 0 when x < —a2', 


] 


9 >) P = os . —— 
(2.16) Fe) =1-77 3: 


THEOREM 2.2. If F(x) and G(x) are two d.f.’s for which (2.11) and 


and F(x) = 1 when x 2 2, then F(z) is a df. satisfying (2.11). 


(2.21) [ x dG(x) = 0, [ a’ dG(x) = 1 


ws 


hold, then 


99 | F(x) — G2) | l zi 
(2.22) | F(x) G(x)| s Typ’ o << < ow, 
Moreover the function (1 + 2°)~ is the least function such that (2.22) holds when- 
ever F(x) and G(x) are df.’s satisfying (2.11) and (2.21). 

The conclusion (2.22) follows from (2.12), (2.13), and the analogous inequal- 
ities obtained by replacing F(x) by G(x). To prove the last part of the theorem, 
let 2) > 0. Let « > 9. It follows from Theorem 2.1 that there is a d.f. F(x) satis- 
fying (2.11) for which 


995 V(x a 
(2.23) F(a) <1 i+ 





GLOBAL CENTRAL LIMIT THEOREMS 29 


If h > 0 and G(x) = 0 when x < —(2h)*”, G(x) = h when —(2h)"*” Ss x < O, 
G(x) = 1—h when 0 S x < (2h)”*”, and G(z) = 1 when z = (2h)"”, then 
G(x) is a d.f. satisfying (2.21). By making 0 < h < ¢/2, we obtain 

(2.24) G(x) > 1 — é/2. 

Hence 


alia 
1+ x a 
It follows that (2.22) cannot be improved when zx > 0, and a slight modification 


of the argument shows that it cannot be improved when z = 0. That (2.22) 
cannot be improved when x < 0 follows from the fact that the inequality 


(2.25) | F(x) — G(x) | > 


€. 


—_— [fl — F(—2)] - 1 - G(-2)]| sa42y" 


cannot be improved when z > 0. Thus Theorem 2.2 is proved. 

In case G(x) is the Gaussian d.f.®(x), it is possible to replace the right member 
of (2.22) by a smaller function of z. Theorem 2.1 and the rather crude fact that 
if x < 0, then 


(2.3) (1 + 2*)* — &(z)] = (x) — OI, 
that is, 


| 1 
_— 
(2.31) He) Se 


imply that if F(x) satisfies (2.11) then 


| F(z) — &(z)| s —— — a2), 
i+ 2 


For x > 0, the corresponding inequality is 


1 


tah an meal a 
| F(x) *2)| STH 


— [1 — &(2)], x> 0. 


The right members of (2.32) and (2.33) cannot be replaced by smaller functions 
of x. If F(x) satisfies (2.11), then (2.32) and (2.33) can be used to show that 


[ | F(z) — ®(z)| dr $2 [ | Bt the #2) dx 


1 + 2? 
= 9 — (2/x)'* = 23487 ---. 


wae 


It should be expected that the estimate in (2.34) is rather crude, but the last 
member cannot be reduced below (2/7)? = .79788 --- because if F(x) is, for 
each a > 1, the d.f. satisfying (2.11) for which F,(x) = 0 when x < —a, F,(z) 
= 1/2a° when —a S x <0,F,(z) = 1 — 1/2a° whenO S x < a, and F,(z) = 1 





30 RALPH PALMER AGNEW 


when x 2 a, then 
fo 9\1/2 
(2.35) lim | | Fa(x) — &(x) | dz = 2 | &(x) dx = (2) ; 
aro Ya Lc n 

While discussing Theorem 2.1 with the author, Aryeh Dvoretzky remarked 
that similar but simpler considerations should produce inequalities better than 
(2.12) and (2.13) when F(x) is a symmetric d.f. satisfying (2.11). A d.f. F(x) is 
called symmetric if F(x) — } = 4 — F(—2z), or F(x) + F(—xz) = 1, for each 
x for which F(x) is continuous. A symmetric d.f. F(x) satisfies (2.11) if and only 
if 


(2.4) | 2° dF(x) = }. 
0 
We prove the following theorem. 
THEOREM 2.5. If F(x) is a symmetric d.f. for which 


ae 7 


(2.51) | 2 dF(2) | 2? dF (2) 


ew 


then 

(2.52) 

(2.53) F(z) 
(2.54) : F(z) 


(2.55) 2r° < F(x) : 


(2.54), and (2.55) hold wherever F(x) is a symmetric d.f. satisfying (2.51). 

Let xz» be a positive value of x for which F(x) is continuous. Then F(x) 2 
F(—2) = 1 — F(x) and hence F(a) 2 4. Thus (2.54) holds, and the lower 
bound cannot be increased because if F(x) = 0 when x < —1, F(x) = } when 
—1 S 2 < 1, and F(x) = 1 when z 2 1, then F(z) is a symmetric df. satisfy- 
ing (2.51). We find also that 


Moreover the functions 3x and 3} are optimal functions for which (2.52), (2.53), 


(2.56) 1 > / “a? dF (2) > | a2 dF(z) = 22[1 — F(x), 


and hence F(x) 2 1 — $29. This implies (2.55). The left member of (2.55) 


cannot be increased because if za = 1 and F(x) = 0 when x < —Z, F(x) = 
1/2x5 when —x% < x < 0, F(x) = 1 — 1/225 when0 S x < %, and F(z) = 1 
when xz = 2, then F(x) is a symmetric d.f. satisfying (2.51). The facts involv- 
ing (2.52) and (2.53) follow from applying the facts involving (2.54) and (2.55) 
to the d.f. [1 — F(—-2)]. Thus Theorem 2.5 is proved. 

The next theorem can be obtained with the aid of Theorem 2.5 just as Theorem 
2.2 was obtained with the aid of Theorem 2.1. 

THEOREM 2.6. Jf F(x) and G(x) are symmetric d.f.’s for which (2.11) and (2.21) 





GLOBAL CENTRAL LIMIT THEOREMS 


hold, then 


(2.61) F(x) — G(z)| S 3 


’ 


and 


(2.62) F(x) — G(zx)| s 1/2zx° ei & tt. 


> 


Moreover 4 and 1/2x° are the least functions of x such that (2.61) and (2.62) hold 
whenever F(x) and G(x) are symmeiric d.f.’s satisfying (2.11) and (2.21). 


3. Formuias involving d.f.’s and c.f.’s. We now obtain some formulas in 


which we can replace F(x) by F,(x) and $(t) by [¢(n~"t)]”. Use of (1.2) and the 
formula 


(3.1) ee? = / e* d&(xr) 


gives 
a x 


e* d[F(x) — &(z)| 


a) 


--| 


ax 
ee 


[F(x) — &(z)] d,e* = —it [ [F(x) — &(x)Je'* dz, 


the calculations being valid because the integrals exist, [F(x) — ®(x)| - 0 as 
tz - . 
x|— x, and e’” is absolutely continuous. Hence 


—~{2/9 l 


(3.21) ; fo(t) — & | | \F(x) — (x)Je” dz. 


(Qx)¥*t ~ (2x)? 


Since (1.3) holds and (2.21) holds when G(x) = (zx), it follows from Theorem 
2.2 that, for each p > 0, 


ted 268? 6 a. 

(1 + 2°)? 

The function in the right member of (3.3) being integrable over —» < zt < @ 
when p > 3, it follows that [F(x) — @(x)] belongs to class L, for each p > 3. 
Since |F(x) — (x)] belongs to class L. we can use (3.21) and the inversion for- 
mulas for Fourier transforms to obtain 


oO ae oe 1 - l ‘ —t2/2 —izt 

(3.4) F(x) — ®(x) = (Qx)ir a (Qu) [o(t) — « Je at. 

Use of either (3.21) or (3.4) and the Parseval formula for Fourier transforms 
gives 


2 


x —t2/2 
| | F(x) — &(z) ? dx = J | er 4. dt. 


Qn J t 





RALPH PALMER AGNEW 


Because [F(x) — ®(x)] belongs to class 1, , it follows (3.21) that the left mem- 
ber of (3.21) is, when properly defined at t = 0, continuous over —« <{t < @., 
Using (3.21) and (3.3) with p = 1 gives 


—t2/2 oo 
(3.51) mF | [ a a 
t —eo 1 + x” 


Since (1.2) implies that | g(t) | S 1, we have also 
—t2/2 
(3.52) | < 


Therefore 


¢2/2 \2 


¢(t) — & 


(3.53) 
23) ; 


where h(t) = x when |t| S #/2 and h(t) = 4/;t)? when |¢| 2 x/2. This 
shows that the integrand in the right member of (3.5) is dominated by an in- 
tegrable function independent of the particular d.f.’s and c.f.’s in (3.5). This 
fact and (3.5) show, without use of other facts relating df.’s F,,(2) and their 


c.f.’s n(x), that if ¢,(t) e** for each t, then 


ao 


(3.6) lim | | Fa(x) — (x) |’ dx = 0; 


only the Lebesgue criterion of dominated convergence for taking limits under 
integral signs is needed to draw the conclusion (3.6). When the conclusion (3.6) 
has been attained, Fheorems 3.2 and 3.1 of [1] become applicable to establish 
(1.6) and then (1.7) for each p > 3}. 

In terms of notation of the introduction, use of (3.4) and (3.5) gives 


2/9 


° oo , —4 ne oo 
(3.7) F,(2) — ®(z) = -{ lon" = ett at 


2r t 


and 
a 7 ‘ 1 a) (n7 24)" 
(3.71) | F,(x) — (x) | dx = — lo(n )| 
v—o 2r — 2 t 
It is frequently convenient to make a change of the variable of integration in- 
the right member of (3.71) by replacing t by n't so that (3.71) becomes 
a a 1" r —t2 
_ “ 9 l ] (t)) — 
(3.72) | F, (x2) — ®(x) |’ dx = - = | lo Sstdeeate 
— oe 


9 


nl? Ir — 0 t 


Since (1.2) implies that ¢(—?t) and ¢(t) are conjugate complex numbers, the in- 
tegrand in the right member of (3.72) is an even function of ¢ and therefore 


aoe 2 


ak \]™ — [ea t/2)n (2 
(3.73) | \P,(2) — (2) Pde = 1} | lol — tener at. 


“x nil? aw Jo t 


Similar modifications of (3.5) and (3.71) can be made. 





GLOBAL CENTRAL LIMIT THEOREMS 33 


There are numerous formulas for inverting tle Fourier-Stieltjes transforma- 
tion 


3.8) g(t) = / e'* dF(z), 


that is, formulas giving F(x) in terms of ¢(t). Some of these formulas are given 
in a book [2] of Cramér. In cases where it is known that F(z) is a df. satisfying 
(1.3), the formula obtained from (3.4), that is 


: . oo “ee —t2/2 ie 
(3.81) F(x) = @(z) + — | )- 6 oe 
2r — 2 t 

may be most convenient when one wishes to study the difference between F(x) 
and #(z). In particular, it may be true that (3.71) or (3.73) is the most fruitful 
source of information about the left member. 


4. The symmetric binomial d.f. It can be expected that, at least for the 
case p = 2, considerable information can be obtained about the constants C‘?’ 
in (1.8) for special d.f. F(x) and for classes of d.f. F(x) having 3 or more moments 
satisfying specified conditions. Before more extensive investigations are under- 
taken, it seems desirable to have rather precise information about the behavior 
of the constants C? for the case in which F(z) is the symmetric binomial df. 
usually associated with games of heads and tails. Thus we let F(z) = U when 
x < —1, F(x) = 4 when --1 S z < 1, and F(x) = 1 when z 2 1 and seek 
information about the constants B, defined by 


(4.1) B, = | | Fa(z) — (2) |? de. 
Since o(t) = cos ¢, it follows from (3.73) that 

\ 1 [e~t7)" an 9 dt 
4.11) B, nile - [eos t} | 2 . 
There fore 
(4.12) 


where 


)"e —nt2/2 


—2(cos t dt 


2 
(e*”*)" — (cos t)" 
t 


* (cos t)” 
f? 


dt, 


dt. 





34 RALPH PALMER AGNEW 


As we shall show by giving more precise estimates, BY” and BS” approach zero 
with exponential rapidity, B® is of order at most 1/n’, and B“” is of order 1/n 
so that B, is of order 1/n. 

We find that 


0 < BY 
(4.21) 


and similarly 
(4.22) 


ry : (3 . ‘ 
To estimate B;,’ we use the inequality 

—t2/2 

e — cos f l Lg 

<< SS 0<t< 
e 12 
One way to prove (4.23) is to use the elementary power series expansions of e* 
~ + . ° ‘4 Of . . 2k 

and cos x to find that the function in (4.23) has a power series expansion >> az 
which is, when 0 < ¢ < 2/2, an alternating series converging to a positive 
number less than a» which is 7. Using (4.23) and setting momentarily b = 


t2/2 . 
é , a = cos t, we find that a < b and hence 


n—l 


n—1 
(4.24) 0 <b" — a" = (b — a) Db“ < (b -— a) DO", 


k=O 


so that 
(4.25) 0 < b" — a” < n(b — a)b”", 
and hence 


} — 
(4.26) 0< —— < 


This and (4.15) imply that 
(4.27) 
and hence 
(3) 3/2 l 2 
(4.28) Bb. <a oe 6 
1447 
Making a change of the variable of integration in (4.28) gives 


(4.29) ge? <5 em al fe dt 
ae o —_— — - e al. 
n* 1447 /0 





GLOBAL CENTRAL LIMIT THEOREMS 


Using the standard formula 


a2 


4.291) | te? dt = 


“0 


(2k) 4 1/2 
D2k+ 1]. | 7 'y 


with k = 3, gives 


8) 5 a2/4 ] 
(4.292) Be” 1S seen OO 
2687/2 n* 


To estimate B\’, we start with (4.16) to obtain 
1 1 > eas. ais 2. oe { ee 
kx—x/2 Leo (kw + t)? 


dt, 


nl? aw pel - nl? we pny 


ox) 


| S(t)(cos t)*" dt, 


j2 


J 


] 
nil? x 


BY = 


Bh: sie 


k= 1 (kr + t)?" 


«0 a/2 
| S(t)(cos t)" dt = I S(—1)(cos t)*" 
2 0 


~—-r/2 


Therefore, 
/2 
(4.4) B® = A = | Si(t)(cos 1) dt, 
n'? x Jo 


where S,(t) = S(t) + S(—t) and hence 


4.41) s = Z| Biel te ae |. 


km L (kor + t)? (ke — t)? 


The function S,(¢) is an even function which is analytic except for poles at the 
the points +7, +2z,---. Its derivatives are easily calculated from (4.41); 
the first one is 


1 
(4.42) S(t) = 2 ah tases ieee A ae 
; >| (ie +0? hr — “| 


This shows that S;(0) = 0 and that S;(t) > 0 when 0 < t < z, so that S(é) 
is increasing over the interval 0 S t <= 2/2 in which we are interested. Use of 
(4.41) and the formulas 


9 


(4.43) > m a =) > at x 


kat k mi(2k-1)? 8§ 


iis 





36 RALPH PALMER AGNEW 


shows that 


2 3 


(4.44) S,0)=4, S&S (5) a. =. 


While we shall not use the fact, we nevertheless pause to remark that S,(¢) is an 
elementary function. Use of (4.41) gives 


= d —1 1 d — l 
4.45 t) = eo ee eS) 
(4.45) Si() 2 dt E +t 7 kr — A dt 2 er — f 
But, as is shown in textbooks on series, 


(4.46) 1 _i1-—tecott_ sint —tcost 


im kr? — 2¢ 2? sint 
when ¢ + 0, +2, +27, --- . Therefore, for these values of ¢, 
(4.47) Si) = ; (} — cot t) - =r, be ‘ - ST 

Using (4.4) and the fact that S,(0) = 4, we obtain 
(4.5) BY = BY + BY’, 
where 


(4.51) B® =+.+ [ (cost) dt 


n'/2 Sr Jo 


L 


(4.52 a wet. | [Si(t) — S,(0)](cos t)® dt. 
Jo 


nil? 


Since S;(t) is increasing over 0 < t < 2/2, we see that BS > 0 and hence that 
B“ > B®. The integral in (4.51) is elementary and well known. In fact 


a i Qn (2n)! x 
(cos 2)" d= ——— — 
| 2*n!n! 2 


1 ip ra i l ) 
ni 2™ E 8n + 0(4 |. 
This and (4.51) give 


(5) ; a l | 
5 ;. == Pm a 
te . n I 8n 7s (4)| 


To estimate BS”, we put the integral in (4.52) in the form u(t) dv(t) where u(t) = 
[S,(t) — 8,(0)]/sin t and v(t) = —(cos t)*"*/(2n + 1). Since S}(0) = 0, i 
follows that w(t) — 0 as t ~ 0. Hence integration by parts gives 


< 1 


nil? ¢ Qn + 1 


(4.53) 


r/2 
(4.55) B® = | u’(t)(cos t)*"** dt. 
0 





GLOBAL CENTRAL LIMIT THEOREMS 37 


Properties of S,(t) imply that w’(t) is bounded, say |u’(t)| < M, over 0 < t < 
7/2. Since |cos t| < 1, it follows that 


w/2 
(4.56) = M / (cos t)”" dt. 


“0 


Use of (4.53) gives 


(4.57) <K <o== + o(4)]. 
ey" ” 


Hence BS = O(n). Since also BY = O(n), B® = O(n), and Bo = 
O(n*), it follows from (4.54) that 


3 


4.58) B. == - 
(49 n 69/2 


This is the result in (1.9) of the introduction. 

We conclude this section with two remarks. It is well known that the d_f. 
F(x) has jumps in the neighborhood of z = 0 asymptotically equal to (2/x)'?n*” 
and that the least upper bound M,, of |F,(x) — ®(x)| is asymptotically half of 
these jumps, that is, (in)'?n"”*. The result in (4.58) shows that the set of values 
of x for which |F,(z) — #(x)| is near its least upper bound cannot have large 
measure even when n is large. If we let 0 < @ < 1 and let £(@) denote the set of 
values of x for which |F,(x) — ®(x)| > 0(44)'?n*”, then we have 


| < av o(+) - | | F(x) — &(z) |° dz 
n 6r'!? n® =e 


G1 a ; 
> / s- — dz = =— | E(8) 
(0) 2m 7 2nxr 
where |E(@)| is the measure of E(6), and hence 
(4.61) E(0)| < 2'?/30 + O(n"). 


aa at . . 
The average root-mean-square deviation of F(x) from (x) over the interval 


1/2 


—n'? <x < n'”, where 0 < F,(r) < 1is 


2 


1 r 
Qn? Lan tiie 
—— [| Fat) a(2) Par + 0(4))" 
~ D1PBp_I/A se ihe x) | z+ ni 
! 1) PL erarirery 1 1 
QU2yile Ee *e (+)] ~ (Detain 7a vw et> 
0 216831 1 
nals E v o(4)|. 


1/2 
F,(x) — &(z) |’ ds | 





38 RALPH PALMER AGNEW 
5. The uniform distribution. Let U,; , U;, U;, «-- be defined by 
(5.1) Un = [| Falz) — 02) | ae, 


where F(x) is the d.f. of a random variable uniformly distributed over —a 
x S a. Thus F(z) = Owhenz S —a, F(x) = (x + a)/2a when —a S x S a, 
and F(z) = 1 when z 2 2a. This d.f. has mean 0, and we assume that a = 3°” 


so that the standard deviation is 1. The c.f. is 
(5.2) ¢ (t) = (sin at)/at, 


and it follows from (3.73) that 


(53) ho alee |(“) 
nr? we Jo | at 


1 


To estimate U, , we let 

(5.31) é&, =n” log n 
and set 

(5.32) U, = Us) + UR’, 


where 


(5.33) IP = Saf (= -) ~ etn 
n*'* w Js, | at 2 


n 


ml? wr Jo at t? 


(5.34) r(2) s 1 be (a) 2. nt2/2 , dl 


For each sufficiently large n we have 
° | . . 
- sin at \ | sin @é,, 
(5.4) (3 : | < : ’ 
at a6, 
when ¢ = 6, . Hence, for these values of n, 


* 1 ” | /sin aé, \" nazi dl 
if a = “in | - = + é : F 
nil? a ab, - 


‘ . © 2n 
1 2 (= dn 4 mil. 
nl? rb, ab, ; 


in the last step we used the inequality (x + y)’ S 2(z* + y’). This and (5.31) 
imply that US” = O(n“) for each k and in particular that US? = 0(n™). 


To estimate U*’, we begin by estimating the quantity 


(5.5) = |(- y- en 
a 


(5.41) 





GLOBAL CENTRAL LIMIT THEOREMS 


in the integrand in (5.34). Since a = 3", we have 
+ ot ie 
(5.52) log = 
and hence 
(= ad \" 
ai 


nt 6 , 2,3 
mn + O(nt’) + Ofn't |. 


+ Of nt’) + O(n't »|, 


2p ‘ see 
100 + O(n) + O(n’e |. 
Hence 

(5.6) 


where 


(5.61) 3 » l or n't? 
: 2 400 ° 
(5.62) = — = | O(n?) + O(n?t” Je" dt. 
“ J0 


Using (5.31) and making a change of the variable of integration in (5.61) gives 


- oo r(3) I — 6 u2 
(5.63) U, = = ue du, 
n® 4007 Jo 


and hence 


, gta | | fe seit 
5.64) US = 0(n” —— a 
(5 P ) * n* 4007 | _ " 


Using (4.291) with k = 3 gives 


FR r(s) 3 l 3 
(5.65) ( , = On )+ [28001 5° 





40 RALPH PALMER AGNEW 


The method used to estimate US” shows that US? = 0(n~*). This and the fact 
that US” = 0(n™*) imply that U, = O(n *) + U and hence that 

(5.66) U, «iS mde. 

n? 1280x'/? 

This is the result given in (1.91) in the introduction. The dominant term in (5.66) 
has the decimal approximations 


1 3 _ 0001322319 _ 1 
n* 1280r'/? n 756.2470n? 
Ss (coe) = 1 
n (27.5000n )? ” 
To complement (5.66), it is of interest to have an inequality which gives in- 
formation about U, when m has a fixed value, say 10 or 25 or 100. We obtain 
such an inequality by use of more precise relations between ¢(t) and ¢ - 


Where 6 is a positive number to be determined later, we start with (5.3) and put 
U, in the form 


(5.7) 


where 


sin “Vy 
a xia ed 
al 


-(2) 1 1 ¢*| /sin at\" 
Vy. = = - womans — (e 
nl? a Js at 


In estimating V,’ we simplify formulas by setting 
«2/2 . , 
B=a«*”. A = (sin at) / at. 
Jse o » elementary power series expansions of e* and sin « and the fact that 
Use of the elementary power ser xpal f 
gl? .: 
a= 3°" gives 


ok 


(5.74) B-—-A= > (-»'| Bea) 3 |e. 


2k! (2k + 1)! 


and hence 


= l _ . 
(—1)*|- ona — r. 
2 | sence +2)! (2k + 5) | 


Rewriting (5.75) to display the numerical values of the coefficients of the first 
terms, we have 


bBo Lge 


5.76 — 
(5.76) 0 480 420 


We now let 


5.77) 6 = (24/13)"”. 





GLOBAL CENTRAL LIMIT THEOREMS 41 


When 0 < ¢ < 4, the series in (5.75) and (5.76) is a convergent alternating series 
of the form wu — u4 + w— --: in which ~% > 0 and mw >m>w>-::-. 
Hence, when 0 < t < 6, 


; ro Be tea ieeG 
5. Oitikaor icant he 
am Cr oe a ee 


Thus 0 < A < B and use of 
n—1 

(5.79) B® — A° = (B— A) >, A's” <2l(B —- 4)B™ 
k= 


gives 


B” — A" t —(nm—1) 42/2 
5.8) 0.< = ae, 
(5.8 d< j <n a0 ° 


Use of (5.71), (5.73), and (5.79) gives 

. ge ee 
58 ) yo 3/2 oa | fe (n—l)t dt. 
58 ‘* es 


Making a change of the variable of integration in (5.81) gives, when n > 1, 


7/2 Vn-i8 
(5.82) 7 ( .) aie | u'e™* du. 
; n— 1 n*? 4007 Jo 


and use of (4.291) gives 


(5.83) yo (> )3 a 
n— 1 n? 1280x'/? 


Starting with (5.72) we find that 


« a 2n 
(5.84) y< 124 (#2) se om] dt 
n''* w Je at r 


‘: . 1/2 - . 
Since |sin at} < 1,a = 3°", and 6 > 1 we obtain 


(2 Rea nae Pat 2 1 
5.85) Fe” Se oe ae a yp tld = ——| —x7 |e 
(9.89 nit x J, ! Qn te |: sa woer + ary. | 


- te o = 2 2 f - ~ a " 
But 36 = 72/13 = 5.54 > 5.5 and e =" = _'* — 6.35 > 5.5. Since 
m6 = 4.25 > 4, it follows that 


I ] 


5.86) Fa” © com . 
(5.86 n¥? 2(5.5)" 


From (5.7), (5.83), and (5.86) we obtain 
o 


: | | : 3 | | 
5.6 ) U. < ~~ a oa —- ~~ aa 74 TY ira. 
(5.9 n2 (. :) 1280r'/2 + n®!2 2(5.5)* 





42 RALPH PALMER AGNEW 


Fven for values of n as small as 5, the second term in the right member of (5.9) 
is substantially less than the first term. When n = 10, the dominant term in 
(5.66) has the value 0.00001322. The last term in (5.9) is less than 10°° and 
(5.9) shows that Ux. < 0.0000192, the factor [n/(n — 1)]"” having the value 
1.4519 when n = 10. When n = 100, the dominant term in (5.66) has the value 
0.0000001322 and (5.9) shows that Ui0 < 0.0000001369. 


REFERENCES 
[1] R. P. Aenew, ‘‘Global versions of the central limit theorem,’’ Proceedings of the Na- 
tional Academy of Sciences (USA), Vol. 40 (1954), pp. 800-804. 
[2] Haratp Cramér, Mathematical Methods of Statistics, Princeton University Press, 1946, 
575 pp. 
[3] H. L. Roypen, “Bounds on a distribution function when its first n moments are given,” 
Ann. Math. Stat., Vol. 24 (1953), pp. 361-376. 





VARIANCES OF VARIANCE COMPONENTS: Il. THE UNBALANCED 
SINGLE CLASSIFICATION! 


By Joun W. TuxKry 
Princeton University 

1. Summary. The variance of the usual estimate of between variance com- 
ponents in an unbalanced single classification has been found for arbitrary in- 
finite populations by Hammersley [1], who found it necessary to use rather 
heavy algebra. The methods of polykays are here applied to a family of weighted 
estimates to obtain the variances and covariances of the estimates of between 
and within variance components. These apply to arbitrary finite populations. 

Weighting column means equally seems to give a better estimate than the 
classical proportional weighting for the between variance component as soon as 
(i) the between component exceeds 4 of the within component in a moderately 
unbalanced design, or (ii) the between component exceeds the within component 
in a substantially unbalanced design. Slight further gains come from interme- 
diate weighting. Numerical] examples are given. 

While pooling mean squares instead of sums of squares across columns loses 
accuracy, notably for the within variance component, doing the same in calcu- 
lating the between variance component seems to have a minor effect. If the 
within contributions are sufficiently non-normal, this effect will be favorable. 


2. Introduction. This paper closely follows the method and concept of the 
first paper of this series {2}, familiarity with the techniques and results of which 
is assumed. The present paper deals with the unbalanced single classification, 
where we have observations in the various columns. The actual observations 
are supposed to be representable in the form 


(observation) = (column contribution) + (cell contribution), 


where each class of contribution arises from a separate population, or popula- 
tions, and some independence is assumed for the selection or sampling of con- 
tributions in the different classifications (this is not a serious element of un- 
realism for a single classification situation). 

A wide variety of models can be constructed within this framework. The way 
in which the lack of balance arises may be very important. If the number of ob- 
servations in a column is at all related to the value of the corresponding column 
contribution (as might be the case if items with potentially extreme values were 
preferentially lost), the situation becomes very complex, and may be outside 


Received April 6, 1956 

! Prepared in connection with research sponsored by the Office of Naval Research and 
based on part of Memorandum Report 45, ‘‘Finite Sampling Simplified,’’ Statistical Re- 
search Group, Princeton University, which was written while the author was a Fellow of 
the John Simon Guggenheim Foundation. 


43 





44 JOHN W. TUKEY 


the scope of both the present paper and the literature known to this writer. We 
shall assume a fixed pattern of column sizes and a random arrangement of the 
column contributions among them. 

There still remain various possibilities for the cell contributions. These could, 
for example, be drawn from a single population, or from a family of populations, 
one per column. In the interests of simplicity, we shall begin with the case where 
only one population is involved. 


3. Types of analysis. Having specified the probability model, we are not, as 
an acquaintance with balanced designs alone might suggest, through with the 
specification of the problem. There are various possible analyses to make of the 
observations, as we shall shortly see. Let {z,;;}, with i = 1,2,--- ,candj = 1, 
2,---,7;, be the observations, let {z;,} be the column totals, 7,, be the grand 
total, and let R, the sum of the r; , be the total number of observations. 

If we are to have unbiased estimates of the variance components, they will be 
quadratic functions of the z;; with coefficients depending on ¢ and the {r;}. In 
principle, we could start with a general quadratic function and then optimize its 
coefficients in some way. In practice, we select two quadratic functions by some 
scheme involving elements of intuition, find how their average values are ex- 
pressed linearly in terms of the variance components, and then form two linear 
combinations of the original quadratics whose average values are the variance 
components. These linear combinations are then our estimates. Much flexibility 
is possible in this situation, but only a limited amount of flexibility seems to be 
customary. 

Within each column, reasons of symmetry favor using 


x 2 

i+ 

Di (x. i ) 7 
vr 


but we may as well be prepared for the use of arbitrary weights in combining 
these pieces. For the first quadratic then, we take 


= di us Doi (x5 - si), = doi ui 23; - - U; et 


The average value of J will be shown to be >> u;(r; — 1) times the within-vari- 
ance component, so that the within estimate is immediately constructible. 


The other quadratic is usually definable in terms of the column means z,/r; 
and some weighted grand mean 


es Dw; = ’ 


W rT; 


where W is the sum of the weights w; . The usual expression is the weighted sum 
of squares of deviations 


= Xu (% 


‘ 


a me ti) 
vy ut) dw. 7 + (Lu i) 





VARIANCE COMPONENTS: II 45 


We shall confine our analyses to this family of cases, which is parametrized by 
the {u,;} and the {w,}. 

The choice w; = r; gives the customary analyses, which treat observations as 
important and columns as unimportant. This is appropriate when testing signifi- 
cance or when the column variance component is small compared with the within 
variance component. 

The choice w; = 1 gives the equally-weighted-column analyses, which treat 
columns as important and observations as unimportant. This is appropriate 
when the column variance component is large compared with the within variance 
component. 

Some intermediate choice of weights may indeed be preferred. 

Finally, as we shall see, it is sometimes possible to choose the weights so that, 
although the pattern is unbalanced, the analysis becomes a balanced analysis 
in the sense of Paper I [2]. This, we shall see, occurs when we try to make the 
estimate of the between variance component unbiased whenever the variances 
within the various columns differ. 

We shall try to obtain as general answers as seem useful for this family of 
cases, 


4. Model and elementary results. Our model is 
Zg=ut nt wij, v 
n’s from n, ki, ku, --- 
w’s from N, Ki, Ku,--: ; 


independently and randomly sampled and arranged. 

It is easy to see that the values of the » do not affect the value of the within 
sum of squares J. Since J is quadratic in the y’s and w’s and invariant under a 
common translation of all the values in either population, we must have 


ave {J} = Ko. 
Since L is also quadratic and invariant, we must have 
ave {L} = tk. + §Ke. 
The estimates of the variance components will then be 


within = -— J, between = : L- = a 
oF 


f 


Arguments similar to those just used, and entirely parallel to those used in 
Paper I for the balanced case, now determine the finite population corrections 
and the vanishing of certain coefficients in the expressions for variances and co- 





46 JOHN W. TUKEY 


variance. The results are 


9 
var {between} = (« — 2) ky t+ (6 - —" + nkeKe + 6K, + «4 Kn, 


bia 1 2 . 
var {within} = (6. -- 7) K,t+ (« - w7) Ku, 


cov {between, within} = 6; K, + 6 Ke. 
We are left with the task of determining ¢, ¢{, &, a1 , 81, y1, 5: , ds , ds, & , &, 3; 
we may do this by treating separately the cases of (i) single minimal unit popu- 
lations, which together give us ¢, {, &, a: , 5:, 6:, 5;, and (ii) normal theory, 
which gives us 8; , 71, & , €, €:. These are our next tasks. We shall find it help- 
ful to calculate, as intermediate quantities, some of the coefficients in the formu- 
las for the variances and covariances of / and L. These formulas are of the forms 
w2 


é 2° , 
var {L} = (a. — Ly ky + (6. - — ) Koo + ike Ke 
n n—1 


+ (6. - )K.+(«a -—%-) ke, 
n—1 
e\ 

var {J} = i - 2) Ke +(e - 8 Kx, 


cov {J, L} (3. i #) K, + («. _ ) Ke. 


n n—1 


5. A single-unit column. If we take the special case where all w’s are zero 
(K, = Ky = Ke = 0) and the 7’s are a minimal unit population (just enough to 
go around, all zero except one and that one equal to unity, n = c, ky = ky = 
c/1, ke = 0), then we can make the first step. We will have J = 0, and if the 
unit 7 falls into the jth column (an event of probability 1/c), we shall have 


2 saa7 
tje =7;, Other 2,4 = 0, L = w; — w;)/W = a,;, 


which defines a;. (If we write 6; for the relative weight w,;/W, then 
a; = W0@,;(1 — @;).) 
Thus, 


A = a; = W- 7 > w;, 
and hence 
A ‘ 1 
— = ire {L} ame tke + EKe = ¢- 
c c 


so that ¢ = A. Correspondingly, 


var {L} = a i a; — (4) 
c c 





VARIANCE COMPONENTS: 


so that 


a => =-W>d (#) ( 


6. A single-unit observation. Take, now, the special case where the w’s are a 
minimal unit population (V = R, K, = K, = 1/R, Kx = 0) and all the n’s 
vanish (k, = ky = ky» = 0). If the single non-zero w falls in the jth column, an 
event whose probability is now r;/R and not 1/c, then we have 


J = w(1 -+), L= 
rj 


(Note that a; = rjb;.) 
The average value of J is 


ujr 
ave {J} (1 
ave {J} > R 


whence ¢ = > u,(r; — 1). The variance of J is 


2 
var {J} = > ui(-+) -(% 
3 


so that 


2 
os (1 4 ) Se (x, Hele 
j 


The average value of L is 


: L Zh = ave {L} = th, + &K: = 


i 
so that 


Ww; 1 
E= Do rb; = Lo we 2 
Yi 
For the two special choices of the w; , this reduces to 
gf=c-—1 (for w; = 
1 1 
= 1 _—_—— — f 9 = 
g ( 1) > (for w; 
The variance of L is 
var {L} = LX Re - ( 
so that 


6, = > rp}. 





48 JOHN W. TUKEY 


The covariance of J and L is 


cov {J,L} = >> 7 bu (1 _ 


whence 
6. = a usb; (1; — 1). 


7. Normal theory. We now need to calculate variances and covariances of J 
and L on normal theory (where ky = K, = 0, Kn = K3,n = N = ~). The 
sums of squares within each separate column are distributed as multiples of chi 
square, independently of each other and of L, so that we have 


é _ 2K 
var {J} = Zz {a = 6K 
I 


whence 
« = 2>  uj(r; — 1), 
and 
cov {J, L} =~ 0 = «Ku, 
whence 
« = 0. 


In calculating var {L}, it will be convenient to assume that all means are zero, 
so that 


var {()0 ciass)"} = 2 (var { D csis})’, 


and to write 


z= >) wiri., 


var {7;} = kk + Ko, 


var {z_} (> wike + (> wi/r;) Ke), 


cov {z;, 2.) = (wats ood = K:) , 


' 


and since 


L = > wai. — gat. 





VARIANCE COMPONENTS: II 
we have 
2 
Sennen 
(S wike + (X wi/r:) K2)’ 


+ 
— AD wi (wie +“ Ks) 
Ww i 2 r; 2 


- Bike + yukeKe + eK}, 


whence 


pemeee F557 


oar (cz 
tarde d(s 


: 1 wi P ) 
$+ m((Zq) -= 
8. Combined results. Combining all of the results above, using such rela- 
tions as 


and writing y = &/#, we find 


oy ae 
2 (5 


Br = Da + (XL wi)? - (L w))), 





JOHN W. TUKEY 


4(22+ p((L*)- (4), 


eS Dj — WL ud; — + ¥ E u(r, —~2+- )), 


rj 


- m((x2) -L)+¥ Lue - 1)) 


, 


d+ m(E2) -(9)+ 2p 


“ec as Zailu-2 +7) 


1 ru 
‘ta-7°-* 


= Wage (x ujb(r; — 1) — yy a(n —: 


Se 
= - Ss, 


os bi ie, = 
~ ASue, app 


y= “ LD 5b; 
> uj (r; — 1)’ 
y - 1b Lwdilrs — 1) 


A YS ujlr;—1) * 


These results are not easy to digest, but, computationally, they are quite man- 
ageable. If we introduce g; 


; = u,(r; — 1) and put f; = a;/A, so that b;/A = 
f;/rj , and rearrange the order of the equations, we may write them in the fol 
lowing way: 


=Dfi 


B= Qa + re, (EX ui - (Ew), 


mn = 400(fi/r3) + ow (> wid (wi/r)) — Swi /rs)], 


& = > ug; € _ Vx 9), 





VARIANCE COMPONENTS: II 


2) Uj Ji/ (> 9;) 5 

= [> (f;/rs) lee ’ 

~(h:/73)9:/>_ gi; 

& —[D0 (fi/rs) oe, 

» LGi/r) — ALG /rd 8 + LGi/r) Pee, 
2 2 al 
2D) + gm (Dtwi/rd¥ — Dewl/] + (Lhi/nyfe. 
(The precise form of these equations has been chosen with computation in mind; 


it takes account of the fact that the w; are likely to be small integers.) 
9. Special cases. The quantities appearing in the formulas for a to « fall 
naturally into several groups according to which of the {r;}, {w;}, and {u,} they 
TABLE 1 
Quantities depending on w; and r; but not on u; 





Quantity General Form Form for w = rj 


rw 


Form for wj = 1 








JOHN W. TUKEY 


TABLE 2 
Quantities depending on {u;} and {r;} but not on {w,} 


| | 
. 1 
Quantity General Form ; Form for 4 = —— 


m= 1 
| > u; (r; — 1) 
| > Wi - YD 


| > ua(n "9 4. 


rj 


= 
r, -R 





involve. The various quantities and their special values are given in Tables 1 
and 2 with the exception of 4’, which is the only quantity essentially involving 
both {w;} and {u,}. Its special values are: 


1 1 c 1 
te(e- 1+ R- Z3) ra 


say 


It is now quite clear that the result is not likely ever to become algebraically 
simple, whatever the values of {w;} and {u;}. 
One of the simplest cases arises when w; = r; and u; = 1. Here we have 
( AR — x.))* 
var {between} = {2(rAR — 1) i) ~ 1\ kg 
(Lar(R — ry 2) 
[Ede an i th l |e 
aed = =—— hoo 
Ss? n—1 
a. = R(R — 1) ( . 5) | : 
+ hk: + SRS: [x ri R Ky, 
2c — 1)(R — 1)R 


_ (r — c)S? 


Kx, 





VARIANCE COMPONENTS: I 
tr 1 1 ¢ pe id 
var { within } = (qty my - st + iF “ryt x) K, 
j i 


2 2 
4gh- nies 


id R(R — i 1 ce 2R(c — 1 
cov {between, within! = — See {x ;; - 3} K, - ea Kn, 
where R = > r,and S = }or(R — 1) = RF? — Yori. 
Since the first of these checks with Hammersley’s result [1] for the between 
variance, we can have reasonable confidence that the result is right, since the 
algebra involved here is somewhat different from his. 


10. Numerical examples. In order to learn what these formulas imply, it 
seems necessary to carry out at least a few numerical examples. The coefficients 
for several special choices of {w;} and {u;} and each of the following sets of 
{r;} are given in Tables 3, 4, and 5: 

(Set I) {rs} = 10, 10, 10, 5, 5. 
(Set II) {r;} = 10, 10, 6, 6, 2, 2. 
(Set III) {r;} = 4, 4, 4, 4, 2, 2, 2, 2, 2, 2. 


TABLE 3 


Coefficients in variance and covariance of variance components for an unbalanced 
design of structure 5°, 10° 


w;, for r; g ; 1 
wi, for r; 





Ui, for ri 
ui, for r; 





a) 
Br 
Y1 
6; 


€1 




















Variance of between | 
variance component | 
for kos = 1K oo 9 | 

k2Ke = VWkn Ke , and } 

ky = —ke ‘ K, = —K ° 609k. 626k 22 | . 628k 22 

ks = K, = 0 ° C22 814k | 827k . | -828kee 

ky = Shoo ; K, = 4K oo a | 1 .634kee | 1.632ke2 | . 1.629k22 








54 JOHN W. TUKEY 


Both common sense and an examination of the tables of coefficients show us 
that if the between component is much larger than the within component, we 
will do better, in calculating the between component, to weight the column means 
equally. Similarly, if the between component is very small, *ve will do best to 
weight the column means in proportion to the number of entries. The big ques- 
tion is, Where does the crossover take place? Tables 3, 4, and 5 also give the 
variance of the between component when the between component is } the within 
component for various degrees of non-normality. When we examine these values, 
we see that changes in k,/ke. and K,/K can have effects which are large com- 
pared to the weighting system. We see further that in Set I (two columns of 5 
entries and 3 of 10) it is already better to use equal weights when (between) = 
} (within) than to use proportional ones, although a slight further gain can be 
had from an intermediate weighting system. In Set II (two columns each of 10, 
6, and 2) equal weighting is not yet as good as proportional to number. However, 
a weighting procedure which weights columns of 10 and 6 both twice as much as 
a column of 2 is better than either for most sorts of non-normality. For this case, 
where the ratio of extreme column sizes is 10/2 = 5, equal weighting is better 


TABLE 4 


Coefficients in variances and covariance of variance components for an unbalanced 
design of structure 10°, 6?, 2? 


w;,forr = % ; l 1 

w, , for r; = 6 j . 1 

w,;,forr; = ; l 

u;,forr; = ‘ 1 

u;,forr; = 6 l ; } } 
u,;, for r; 1 1 1 1 


ay 20271 17638 | .16667 | .20271 . 17638 . 16667 
.51349 .42950 40000 | 51349 $2950 . 40000 
.14173 .15728 | .20444 | .14173 15728 20444 
.00091 | .00269 .00639 | .00003 00043 .00187 
01465 | .02328 | .04028 | .01713 02689 04544 


02837 02837 02837 | .04259 04259 04259 
06667 .06667 06667 | .14568 14568 14568 


.00074 .00126 |—.00193 .00052 00250 00510 
01181 .01425 |—.01704 |—.02581 — .03115 .03723 


Variance of between 

variance component 

for ko = Ka, 
keKe = / kK ,and 
ky = —ko , Ky = —Kep 649k 22 ; 778k 22 663k 2 674kee 816k. 
kz = K, 0 856k. | 970k ee 866k. R52ke | .991 ke. 
ky = 4Koo , Ky = 4Kx 1.681 ko. 1.739ke. | 1.676ko | 1.564k.2 | 1. 687k» 





VARIANCE COMPONENTS: II 


TABLE 5 
Coefficients in variances and covariance of variance components for an unbalanced 
design of structure 4°, 2° 


w;,forr; =: | 4 l 
w,;, for rr, = 
i orrg =. 
Hy 1 
u;,forr; = 4 + 


10900 10000 

24500 22222 

15000 | 16667 
0012: ; / 90001 | 00059 
03670 ' 04050 | 05308 
03438 03750 03750 
10000 ; .13333 | —-. 13333 
— 00113 00016 00156 
— .03375 | —.04500 


Variance of between variance com- 
penent for ko = }Ka, 

keKe = VWkenKn , and 

ky = —kn , Ky = —Kn .578k22 ‘ .598kee 

ky = K, = 0 692k 22 707k | - 768k22 


ky = 4kes , Ky = 4Kes 1.147k2 | 143k: | 1.177k2 


TABLE 6 


Comparative variance of the between variance component in unbalanced* and 
balanced designs 


Variance for ks = Ki = 0, 
ke = 3Km, and kK: = ./pnK, = iKn 


Pattern of Columns r; 


“4848 785k: 
5, 5, 10, 10, 10 S8l4kee 
7 -832k22 

797 kee 

S37 kee 

? 9 Dy Vy * 662k 22 

>) 4 , 2, 4, 4, 4, 4, 30 | 692k ee 

2, 2, 2, 2, 2, 2, 2, 4, « 20 1 .O89k22 


* With weights chosen from those in Tables 3, 4, and 5 to minimize this variance. 


than proportional weighting for kx. near, but somewhat smaller, than Kx. In 
Set III (five columns each of 2 and 4) we have not computed an intermediate 
weighting system. Here proportional weighting is preferred until kx» rises to 
somewhat above Kw . 





56 JOHN W. TUK™Y 


If we examine the effect of changing the {u;}, we see that the case 
u; = 1/(r; — 1), which corresponds to pooling mean squares, rather than sums 
of squares, across columns, is slightly less favorable in each set unless Ky = 
4K , when the reverse holds. 


Finally, it is interesting to compare the variances of the between component 
in the unbalanced designs with those in balanced cases. This is done for one case 
in Table 6. The loss in effective number of observations for a ratio of 2 to 1 in 
column sizes is rather small, being perhaps 3 observations in the first and last 
cases. The loss for the middle case is larger, but not as large as might have been 
expected from the 10/2 = 5 ratio of extreme column sizes. 


REFERENCES 
{1] J. M. Hammerstey, ‘‘The unbiased estimate and standard error of the interclass vari- 
ance,’’ Metron, Vol. 15 (1949), 189-205. 


[2] Joun W. Tukey, ‘Variances of variance components: I. Balanced Designs,’’? Ann. 
Math. Stat., Vol. 27 (1956), pp. 722-736. 





SOME PROPERTIES OF GENERALIZED SEQUENTIAL 
PROBABILITY RATIO TESTS’ 


By J. Krerer anp LioneL WEIsS 
Cornell University and University of Oregon 


0. Introduction and Summary. Generalized sequential probability ratio tests 
(hereafter abbreviated GSPRT’s) for testing between two simple hypotheses have 
been defined in [1]. The present paper, divided into four sections, discusses certain 
properties of GSPRT’s. In Section 1 it is shown that under certain conditions the 
distributions of the sample size under the .wo hypotheses uniquely determine 
a GSPRT. In the second section, the admissibility of GSPRT’s is discussed, ad- 
missibility being defined in terms of the probabilities of the two types of error 
and the distributions of the sample size required to come to a decision; in par- 
ticular, notwithstanding the result of Section 1, many GSPRT’s are inadmissible. 
In Section 3 it is shown that, under certain monotonicity assumptions on the 
probability ratios, the GSPRT’s are a complete class with respect to the prob- 
abilities of the two types of error and the average distribution of the sample 
size over a finite set of other distributions. In Section 4, finer characterizations 
are given of GSPRT’s which minimize the expected sample size under a third 
distribution satisfying certain monotonicity properties relative to the other two 
distributions; these characterizations give monotonicity properties of the decision 
bounds. 


1. Uniqueness of certain GSPRT’s. In this section we identify a GSPRT with 
the two sequences of limits characterizing it. Using the same notation as in [1], 
we assume that the Lebesgue densities f, and f, satisfy the conditions in Section 
2 of [1], even for ¢ equal to zero or infinity (i.e., the probability ratio for any 
number of observations takes on no single value with positive probability), and 
are continuous (this last restriction is easily weakened; see also Remark 1 at the 
end of this section for further generalization). 


First we make the transformation Y; = F,(X,). Under H, , Y; has a rectangular 
el 


distribution; under H, , the density of Y, will be g (say), where I g(y) dy = 1. 


Next we make the transformation Z; = ¢|g(Y;)], where ¢(u) is strictly in- 
creasing in u for u = 0, ¢(u) takes on no values outside the interval [0, 1], and 
also 


| dy=t for OStS1. 


4tylelo(v lst} 


Under H, , the distribution of Z; is rectangular, while under H; , the density of 
Z;is@ '; we note that ¢@ ‘(z) is strictly increasing in z for 0 S z S 1 and is 0 else- 
where. 


Received December 30, 1955; revised October 2, 1956. 
1 Research sponsored by the Office of Naval Research. 


57 





od J. KIEFER AND LIONEL WEISS 


Since we can always transform from X,, X2,---: to Z,;, Z:,-:- and carry 


out any GSPRT in terms of Z, , Z., --- , from now on we assume that fi(z) = | 
1 


on [0, 1], and f2(x) is strictly increasing in x for 0 S x S 1, with | f(x) dx 


“0 
We also assume f2 piecewise differentiable (for the sake of obtaining the g; below 
Any GSPRT is carried out by seeing whether 


m 
Be I] A(x.) < dm, ete., 
2 
or, defining Q; as log fo.(X;), B, as log bm, Am aslog a, , and W,, asQ,; + Q 
+ Qn, whether Bn < Wm < Am, ete. 

Denoting by g; the density function of Q, under H; , we find that g.(q) = ¢° 
gi(q) identically in q. 

Suppose the m — 1 pairs (a; , b:), «+ , (@m—1, Dm-1) are fixed. The joint con- 
ditional density function of (Q;, Qo, --- , Qm) under H;, given that sampling 
continues beyond the (m — 1)st observation, is 


Il 9i\q;) 
=1 


higi-++, Gm) = Se 
d d kK, 


in the region {b; < w; < a;;j = 1,---,m— 1}; hilqi,--+ , Gm) = 0 elsewhere. 
Here K; = P {sampling continues beyond (m — 1) observations under H,}, and 
we assume K, and Kz, are positive. Thus 


— Ky 


a ea) | ( ee oe 
K; 1\q / 


hog, -** 5 Gm) 


Then, denoting the conditional density of W,, under H; given that sampling 
continues beyond m — 1 observations by k; , we have 
.. oe 
ko(w) = —— ° ky (ww) +¢ 


Now we make the following Assumption A: f2 is such that g;(q) > 0 for almost 


all g(z = 1, 2,); thus, if S is any nondegenerate interval, [ gi(q) dq > 0 for 7 


1, 2. But this implies that if 7 is any nondegenerate interval, | kw) dw > 


JT 
7 


0 for 2 = 1, 2. 
For any given positive numbers C, D, we now show that there is at most one 
solution (y, 6) to the two equations 


é 3 
/ ky(w) dw = C, [ k2(w) dw = D. 
5 


me 


For, given any y which can be the first element of a solution, let 6(y7) be the uni- 
quely determined value of 6 which satisfies the first equation. Then it is easily 





PROBABILITY RATIO TESTS 
verified that 


8(7) »3(7) 
c , ‘ Ky t 
ke(w) dw = — - ky(w)-e° dw 
| w) du | K, i(w) 


is strictly increasing in 7. This proves that there is at most one solution (y, 6) to 
the two equations. 

Let D,(n; T) denote the probability that a decision is reached after no more 
than n observations when using the test 7 and H; is true. The considerations 
above yield 

THEOREM 1. Jf Assumption A holds, and if T is a nontruncated GSPRT, then 
there is no GSPRT T” different from T and with Dn; T’) = Din; T) for ali n 
and fori = 1, 2. If Assumption A holds, and if T isa GSPRT truncated after m 
observations, while T’ is another GSPRT with D,(n; T’) = Dn; T) for all n and 


for i = 1, 2, then T and T’ differ only in the terminal decision boundary at stage m. 
teMARK 1. If Assumption A is violated, there are two different GSPRT’s, 7 

and 7”, with D;(n; T’) = D,(n; T) for all n and for i = 1, 2. For we can find a 

GSPRT T whose first pair of limits is (B, , A;), such that for a positive e, 


a Ai+e 

gi(q) dq = 0. 

Ja, 
But then 7” can be taken as the GSPRT whose first pair of limits is (B, , A; + €), 
the other limits agreeing with those for JT. The inessential difference between 
T and T” in such a case as this will be evident to the reader: every sample sequence 
in a set of probability one under both H; suffers the same fate under 7’ as under 
T. Similarly, in the discrete case (or where Q; can take on some constant value 
with positive probability), there is the aspect of randomization in which two 
tests with identical D,(n; T) may differ (e.g., if T and T’ both always require 
at least 3 observations, for some value of Q; + Qe + Q;, T may stop after 3 
observations if Q; < Q., T’ if Qe < Q,). With these modifications in mind, it is 
evident that Theorem 1 is of broader validity than its stated form. 

ReMaArK 2. If Assumption A holds, one can prove similarly that there is at 

most one GSPRT having given values for the elements of the following sequences 
of probabilities: 


| P(accepting H, under H, at stage n)}, 
|P(accepting H, under H, at stage n)}. 


Incidentally, it is easy by the methods of [1] or [2] to show that the GSPRT’s 
(and k-decision problem analogues) form a complete class with respect to the 
generalized risk function consisting of such sequences. A similar remark applies 
if these sequences and the D,(n; T) are considered together as the risk function, 
etc. 





60 J. KIEFER AND LIONEL WEISS 


2. Questions of admissibility.” We have proved in the previous section that 
under certain conditions there is at most one GSPRT corresponding to any two 
specified distributions of n (one under each H;). This does not imply that all 

xSPRT’s are admissible (letting a; = probability of an incorrect decision under 
H; and p?, = probability that the experiment terminates after more than n ob- 
servations under H; , a GSPRT is said to be admissible in this section if there’ is 
no second procedure for which all of the numbers a; and p;, (i = 1, 2;n = 0, 
1, 2, ---) are no greater, and at least one is less, than for the given procedure) ; 
as we shall see in a simple example below, the general question of how to charac- 
terize the admissible procedures seems quite difficult. Before turning to this 
example, we make a few remarks on admissibility. Firstly, it is clear that admis- 
sibility does not entail any simple monotone character of the constants a, and 
b, : on one hand, putting p;, = probability of terminating after n observations 
under H; , on considering the minization of 


2 20 i 
(2.1) o E; E W;+ 2 Din P| where D;,, = yl ie. 
i=l n=l I 

where the C;, are an increasing (resp., decreasing) sequence of positive numbers, 
by comparison with the case C;, = 1, it becomes clear that we may have ad- 
missible procedures with a,7, b,| (resp., an], 6.1); similarly, there are admissible 
procedures with a,7, b,7 or a,|, b,|. (In the case C;,7 (resp., C.n}), it is also 
interesting to note that a‘” < a, < b, < b'™ (resp.,a,n S a” Ss b'” <S b,,), 
where a’” and b‘” are the constant bounds for which (2.1) is minimized when 
Ci, is replaced by C;,.4: for all n.) On the other hand, by considering C;,, 
(h; + ki) + (—1)"h; , one may obviously obtain admissible procedures for which 
An = Ons2, On = Dnse, Gon < Congr < Dongi < ben for all n. Other admissible pro- 
cedures for which the a, and b, have no simple monotone character may be con- 
structed similarly. 

Secondly, one can sometimes give simple necessary conditions for admissibility 
(a sufficient condition such as that of being the essentially unique procedure 
which minimizes (2.1) for some choice of the constants ¢;, W;, C;, will usually 
be hard to verify). Suppose, as before, that every interval of positive values 
(but no single value) of F2(x)/f,(x) has positive probability under both H,. 
Let N be the smallest integer for which ay = by (if no such integer exists, write 
NY = o). Then a necessary condition for admissibility of a GSPRT is 
(2.2) sup a, < inf b,. 

n<oN+1 n<N+1 
To see this, note that for any Bayes solution minimizing (2.1) the constants 
£;, W; must satisfy a, S (Wit:/Wet) S b, for all n < N + 1; thus, any Bayes 
solution must satisfy (2.2). Since the essentially complete class of procedures 
which is the closure in the sense of (Wald’s) regular convergence (see [3], [2]) of 


2In Section 2 the roles of the symbols a and 6 are reversed from what they are in 
the other sections 





PROBABILITY RATIO TESTS 61 


this class of Bayes procedures satisfying (2.2) also satisfies (2.2), and since (see 
Section 1) this class must include all admissible GSPRT’s, the necessity of (2.2) 
is established. As will be seen in the examples below, the condition (2.2) is not 
in general sufficient for admissibility. 

We now give an example which will illustrate how complicated an explicit 
delimitation of the admissible GSPRT’s seems to be. Consider all GSPRT’s re- 
quiring at least 1 and at most 2 observations for testing #,:46 = 1 against H2:@ = 
2 where 


(a.—6z 
‘ : | Be x20 
(2.3) a(x) = ¢ ‘ - 

fo (x) \0, xz <0. 
Let X,, Xz be independent with density f¢ ; write Y; = e **. The hypotheses 
can then be rewritten as H,:Y,; have density f(y) against H2:Y,; have density 
fe(y), where 


lo<y<1 


(y) = . 
AMY) \0 otherwise 


(9 

j}2y,0<y<l 

of = : 

fly) 0 otherwise. 

Thus, 3fe(y)/fi(y) = y, and we may write the general form of the GSPRT as: 
\If Y,; S a (resp., 2b), stop and accept H; (resp., H2); 

(2.5) < 
Ifa < Y, < band ¥;¥2 Ss k (resp., >k), accept H; (resp., H:2). 

Here we may assume a, b, k lie between 0 and 1 inclusive. If a = b, k is of no 
importance. If a < 6, we may suppose 0 < k < 1, since k = Oork = 1 is clearly 
inadmissible (replace b by b’ = a ora by a’ = b, respectively, to obtain bet- 
ter procedures); also, since Y,;¥Y. S Y, with probability one under both H, 
and H,, we may suppose k S a, since k > ais clearly inadmissible (replace a 
by a’ = min (k, b) for a better procedure). All procedures with a = b are ad- 
missible. To summarize, then, in investigating which tests are admissible, we 
may eliminate certain trivial cases mentioned above and hereafter assume 


(2.6) 0<kSa<bSl., 


The characteristics of any such procedure are easily computed and may be 
summarized in the risk vector of any such procedure, which is given by the 
quadruple 

r(a, b, k) {P, (accept Hz), P2 (accept Hi), Pi(n = 2), Po(n = 2)} 


b 9 9 b 9 2 
1—a-—klog-,a + 2k log-,b — a,b — \ 
( a og-,a + og - bh — a,b a 


The question of inadmissibility or admissibility of such a procedure is then that 
of whether or not there exists a test (a, 6, k) for which all components f r(d, 6, &) 
are < the corresponding ones of r(a, b, k), with strict inequality i. at least 





62 J. KIEFER AND LIONEL WEISS 


one component. Since no two different tests have identical risk functions (see 
Section 1), inadmissibility of (a, b, k) is equivalent to (1) the existence of (4, 
b, k) not identical to (a, b, k) and with (II) all components of r(d, 6, &) < the 
corresponding ones of r(a, b, k). The latter condition (II) may be written 


(a) — a b-—a 
(b) 6 “= a : y _ a 
(ce) a+ k log (b a) >a-+t k log (b/a) 


(d) rr 4 Qk log (6/4) < a + 2k° log (b/a). 


The possibility that 6 = a may be eliminated in all that follows: if 6 = a, squaring 
both sides of (c) and comparing with (d) yields (a + k log (b/a))” S a + 2k 
log (b/a); i.e., k log (b/a) S 2(k — a), which is impossible. Thus, we may here- 
after assume @ < 6. Also, d = & > O for admissibility. Thus, in particular, 


0 < log (6/a) < = in all that follows. 
Combining (2.8) (c) and (d), we obtain 


f mi ks 25 \2 2 5 L 9).2 (bh 
(2.9) <max | 0, sigetiantiate k log (b/a) a a Bikini 2K log | : 

\ log (b/d) 2 log (b/a) 
In particular, the right-hand term of (2.9) must be >0. Thus, for a given 
ad, b with 0 < &@ < b S 1, (2.9) can be satisfied for some k withO < k S a4 < b 
if and only if either 


(2.10) $-<> k log (b/a) <9 <t—4 + % log (b/a) 
log (b/d) 2 log (6/a) 


or else 
, 
i 
' 


(a) 0 < a—a + k log (b/ a) 


— <4 and 
log (b/d) 


2.11) 


| (b) (: - a + k log A | < a — @ + 2k’ log (b fa) 


log (6/4) 2 log (b/a) 


Equation (2.10) implies 
(a + k log (b/a))’ S & < a + 2k log (b/a), 


the extreme members of which give 2a < 2k — k log (b/a), an impossibility. 
Since the right side of (2.11) (b) is positive, we also see that the first inequality 
of (2.11) (a) is implied by (2.11) (b): otherwise, we would again have the con- 
tradiction 2a < 2k — k log (b/a). Thus, (2.8) (c) and (2.8) (d) may be satisfied 
for some 4, k, b with 0 < k < 4 < 6 € 1if and only if (2.11) (b) and the second 
half of (2.11) (a) may be satisfied for some 4, 6 with 0 < a < 6 S 1. Write 


c = log (b/a), é = log (6/4), = k/a, 7 = G/a. 





PROBABILITY RATIO TESTS 
Equations (2.8) (a), (b) may be written 


S log {1 + (e — 1)/y, 
eS } log [1 + (e* — 1)/7’). 


Since c, € > 0, equation (2.12) (a) implies or is implied by (2.12) (b) according 
to whether y S 1 ory = 1. We may write the restriction b < 1 as 


(2.13 ¢ sc — log by. 
Equation (2.12) (a) implies (2.13) if y S 1, since b S 1. If y > 1, (2.12) (b 
implies or is implied by (2.13) according to whether or not y < [1 + e“(b* — 1)}'”. 

To summarize, then, (2.11), (2.12), and (2.13) imply that a given (k, a, b 
(and hence, (c, A, b)) is inadmissible if and only if there exist positive numbers 
é, y with either ¢ ~ c or y ¥ 1 (note from (2.9) that @ = ¢, y = limplyk = k 
and hence (4, 6, £) = (a, b, k)) and satisfying 

(a) fily) Sé S faly), 


é = f(y), 
¥< V1 + 2r°%, 
where 
= (1 + Ac — ¥)/7, 
log {1 + (e° — 1)/y] 
= <4 log [1 + (e* — 1)/7’} if 
\c—logby if yellte 


= 21 —y+ rc) /(1 — ¥ + 2r’e), 


and where condition (2.14) (c) merely expresses the positivity of the right side 
of (2.11) (b). 


Suppose A < 1 (the case \ = 1 can be treated easily). It is evident that f,(1) = 


Ae < ¢ = fo(1) = f3(1) and that all points (y, @) with ly — 1| and |@ — ¢| of 
sufficiently small magnitude and for which é@ < f.(y), satisfy (2.14) (a) (and, 
obviously, (2.14)(c)). Hence, primes denoting derivatives, a necessary condition 
for (a, b, k) to be admissible is that 


(2.16) fx —) = f3(1) = f2(1 +). 
Evaluating (2.16) from (2.15), we obtain 


4 {fl—e* 
(2.17) i-e' a Q—1)/k 84 


] 





64 J. KIEFER AND LIONEL WEISS 


On the other hand, the necessary condition supa, S inf b, of (2.2) may be 
written 1/2 S \ S e / 2, or (since \ < 1) 
en Ss ES » ae oe es 
(2.18) 0< (2-—1)/" s} oT 
1 if é 2. 

Clearly, (2.18) includes many procedures not included in (2.17) (hence, not in 
the complement of the set of procedures described by (2.14)). Thus, equation 
(2.2) is not sufficient for admissibility. 

We shall not consider this example further: it is already evident from (2.14) 


that, even in simple cases, the delimitation of the admissible procedures can 
become complicated. 


3. Controlling the distribution of the sample size under distributions other 
than those being tested. In this section we shall characterize (under certain 
assumptions) an essentially complete class of tests (the risk function is given 
below) for testing Ho:f = fe, against Hi:f = fo,,, sequentially, where the test 
is based on independent random variables X,, X2, --- with common density 
fe with respect to some o-finite measure yp. There are specified values K and 
6,, ---, 6x, as well as non-negative numbers ao, a, --* , @x4; Whose sum is 
unity. The “risk function” of a procedure consists of the vector 


K+1 


(3.1) <Pe,{accept Hi}, Pox, ,{accept Ho}, : a;Py,{n = jh,j = 
i=( 


\ 


where n is the (chance) number of observations required. We consider only 
procedures for which P,,{n < ©} = 1 forall ¢. One procedure is said to be at 
least as good as a second one if each component of (3.1) for the first is no greater 
than the corresponding component for the second, and the notion of essential 
completeness is relative to this definition of ‘‘as good as.” 

We assume in this section that the f,(0 S 7 S K + 1) are finite everywhere 
and have the same region of positivity and, writing pin(z'”) = []7u1 fo,(2;) 
with 2°” = (2, -::, 2m), that the functions pim(x’”) / pom(2™”) and 
Pins1n(2”) / Pin(x’”’) are for 1 S i S &k strictly increasing functions of 
Dixs1)m(2”) / pom(x'”) on the domain of positivity of pom (of course, this 
means that either of the first two ratios can increase as the argument changes 
from one value to another, only if the last ratio also increases); thus, the results 
of this section apply to the case fe(z) = c(@)e“h(x) with 0 < 0, << -++ < Ox. 
Our result is 

THEOREM 3. An essentially complete class for the above problem consists of those 
procedures which at the outset randomize between accepting Ho, accepting MH, , 
and taking a first observation, and which thereafter are GSPRT’s for testing Ho 
against H, (with appropriate randomization rules on the boundaries if u is not 
atomless). 

Proor: A trivial modification of the argument of LeCam [2] (there are two 
decisions, K + 2 states of nature here) shows that an essentially complete class 





PROBABILITY RATIO TESTS 65 


can be obtained by taking the closure in the sense of regular convergence (see 
[2], [3]) of all Bayes strategies for the problem of minimizing 


K+1 


(3.2) foaP,,(accept Hy) + Ex4ibPo,, (accept Ho) + > &: Eo, Cn) 
i=0 


for all a > 0, b > O, C,(m) strictly increasing in m and approaching infinity 
with m if 7 is such that a; > 0 and C;(m) = 0 if a; = O (actually, C need not 
depend on 7 here, but must for the considerations of Remark 1 below) and all 
a priori probability measures (f, &, --: , &x4:). Each such Bayes strategy is 
characterized by an initial randomization of the type described in the statement 
of the theorem and by a sequence {D;,,} (¢ = 0, 1, m = 1, 2, ---) of closed 
convex subsets of the (K + 1)-simplex S (whose elements will be described by 
K + 2 nonnegative barycentric coordinates whose sum is unity) with the 
property that, @ denoting the point of S whose ith barycentric coordinate is 
unity, Qo ¢ int Do,, (int denoting interior in the usual topology of S), Qc4: € int Din , 
these interiors are (for each m) disjoint, and for 1 S 7 S k, Q; ¢ Donn Dim and 
Q; € int (Dom U Di») (see Chapter 4 of [3] or the paragraph following Lemma 4.1 
of the present paper for details of arguments yielding these conclusions). The 
Bayes strategy relative to an a priori probability measure — = (f, --- , Exess) 
is then (after some initial randomization as described above) to compute the 
point ¢” = ¢°” (x) of S whose jth component is €pjm(z°"") / SO Esp jm (2) 
(j = 0, --- ,K + 1) after m observations and to accept Hy , accept H, , or take 
another observation according to whether ¢‘” ¢ int Don, ¢” € int Diz, or 
e” © S — Dom — Dim, With some sort of randomization if ¢‘” is in the boundary 
of one or more 2,,, (under our assumptions, if u is atomless, randomization is 
actually unnecessary ). 

Since the class of procedures described in the theorem is compact and closed 
in the sense of regular convergence (see [2]), the theorem will be proved if we 
show each Bayes strategy has the structure enunciated in the theorem. But 
if this is not true, there are values of £, a, b, the functions C;, and a number 
n > 0 and values x”, y” with (2) ~ #”(y™), such that & > 0 and 
txii > 0 (otherwise, P{n = 0} = 1 for any Bayes strategy) and such that 


‘ ( ) ) 
(a) (2) & Din, 


(b) ey) gint Dy, 


n)\ 
)s 


(n) (n) 7.(n) ( 
(ec) Pixsiynly” ) / Ponly” ) > Purssyn(x”) / pon(x 


or else there is a similar situation for Do, (which is handled similarly). Now, the 

— (n) . ° 
convex subset of S spanned by &,(x°”’), Qi: , --+ , Qx4: is a subset of D,, which 
consists of those points w = (wo, Wi, -** , We4i) of S for which 


(3.4) wily = Wol; for all i>o for which t; > 0, 


’ 


where ¢‘”(x°”) = (to ,t:,+** ,tx41). Hence, writing ¢°” (y“”) = (zo , 21, °** , 241), 
(3.3) (b) would imply that zito S zot; for some i > O for which ¢; (hence, &;) > 0. 





66 J. KIEFER AND LIONEL WEISS 


Thus, for that 7 (since also & > 0), we have 
(3.5) Pinly”’) Pony”) S pin(x’”) / Pon(x” 


Since (3.3) (c) implies (by the assumption of this section) the negation of (3.5), 
we obtain a contradiction and the theorem is proved. 

Remarks: 1. Essentially the same proof works to show that the essentially 
complete class of Theorem 3 is essentially complete for the more general problem 
where the components of the risk function are 


K+1 
Ps, {accept Hi}, Pex, {accept Ho}, . ® ai,Po,{n = j}(r = 
t=0 
the a; being given non-negative constants. One can also treat the case where the 
finite linear combination in (3.1) is to be replaced by f Ps{n = j} dA(@) where 
A is a probability measure on a suitable family {Ps} of probability measures. 
These considerations have obvious applications to practical problems where 
the a; (or A) represent the probability distribution of the process parameter. 
2. Theorem 3 can also be proved using the method of [1]; in fact, a proof of 
Theorem 3 can be obtained essentially by going through the proof in [1] and 
replacing P, and P, there by >> a;Ps, and making other obvious similar altera- 
tions. 
3. In cases like those of Lemma 4.2 below other than (3), the assumptions 
of the present section are not satisfied; however, such cases can also be treated 
here with only minor modifications of the above analysis. 


4. Procedures minimizing Hn at a third point, etc. Let f_1,fo, fi be three 
densities with respect to a o-finite measure u. We assume no two of the f; are 
identical almost everywhere (u). Let X, , X2, --- be independently and identi- 
cally distributed random variables with common density f with respect to ug. 
It is desired to test between the hypotheses H_,:f = f_, and Ay:f = f,. Let 
a;(6) denote the probability that the procedure 6 terminates with an incorrect 
decision when H; is true (¢ = +1). Let a? be specified numbers satisfying 
0<ar <1 (i = +1). Let A,(5) denote the expected value of n (the number 
of observations which have been taken at termination) when f = fy and 6 is 
used. Our purpose here is to characterize procedures 6 which, among all pro- 
cedures satisfying 


° %;- 
(4.1) a@,;\0) + a; \t 


= +1), 


minimize Ao(6). Under suitable assumptions (those of Sections 3 and 4 differ), 


the class of procedures delimited in Theorem 3 will evidently contain the pro- 
cedures which do this, but we shall obtain here a much finer characterization of 


them. For the remainder of this section we shall term such procedures ‘‘optimum.”’ 
or ° “ar e.% > = es, 
To avoid trivialities, we hereafter assume aj + ap < 1. 

We first note a fairly obvious property of optimum procedures. Let T be the 
set of points in three-space of the form (a_1(6), a:(6), Ao(6)) for all possible 6 





PROBABILITY RATIO TESTS 67 


(not merely those satisfying (4.1)). Since one can randomize between two pro- 
cedures at the outset, I’ is clearly convex. (The existence of points (a, b, c) with 
ec < ® for any a, b > O follows from consideration of fixed-sample-size pro- 


cedures. A convex combination of points giving positive weight to a point with 


e 


c = « will itself have c = «.) For any procedure 6 satisfying (4.1) and with 
strict inequality for either i = 1 or i = —1, we may (by randomizing between 
6 and a procedure requiring no observations) obtain a 6’ satisfying (4.1) and for 
which Ao(6’) < Ao(d); we may therefore restrict our search for optimum pro- 
cedures to those 6 for which equality holds in (4.1). Among the class of all such 
procedures there exists one minimizing Ao(4), a consequence of Theorem 3.1 of 
Wald [3]. Let e(a*;, alt) = min; Ao(6), the minimum being taken subject to 
(4.1) (with equality). For all « > 0 with « < min (a*,, aj) we have (recalling 
at + a’; < 1) thate(a*; — ¢, a1 — €) > e(a’s, at); for otherwise, if equality 
held, a randomization of the type noted parenthetically above wov!ld produce 
a 6 satisfying (4.1) and for which Ao(é) < e(a*,, at), a contradiction. Since 

. } 


* - * . , . 
e(a_; — €, a; — €) > e(a_y, a; ), and since for any value e > 0 the points 


(0, 1, e) and (1, 0, e) are clearly in T, it is clear that T can not be supported at 

* * * * fe . : : : ee ‘ 
a-,, a1 , e(a_;, a )) by a plane any of whose direction cosines is zero. Since I 
obviously can be supported at this point by a plane with non-negative direction 
cosines, we have 


LemMA 4.1. Any optimum procedure must, for some positive & , & 


&1, &, mini- 


mize 
4.2) 10, (8) — E 1a (6 + £)Ao(6) 


among all procedures 6. (Conversely, any procedure minimizing (4.2) for some 
positive &;’s is obviously optimum for some at’s.) 

Thus, necessary conditions on optimum procedures may be found by char- 
acterizing ‘‘Bayes solutions” which minimize (for a given & , £1 , 0, all positive, 
and whose sum we may take to be unity) the “integrated risk’”’ (4.2). Results 
like Theorems 4.8, 4.9, and 4.10 of Wald [3] (see also [4]) are easily seen to be 
valid in the present case (with the two values of the loss function and cost of 
experimentation altered from their unit values in (4.2) if desired). To summarize 
what we need of these results, all procedures minimizing (4.2) for all possible 

= (§, &, &) with &; = 0, p> & = 1, are characterized in the 2-simplex in 
barycentric coordinates by two closed convex regions C_; and C; as follows: 
after m observations (m = 0, 1, 2, ---) compute the a posteriori probability 


measure ¢'” for the given a priori measure — = £"’ and the observed values 
of X,,---,Xm. Accept H_, , accept H, , or take another observation according 
to whether £”” lies in the interior of C_, , C; , or the complement of C_, u C; ; 
on the boundaries between regions, a Bayes solution may randomize in any 
way (depending on X,, --- , X,, if desired) between (or among) appropriate 
actions. We now describe the C;. Let V; be the point where &; = 1(¢ = 0, +1). 
A point ~ of C; will be called an interior or boundary point of C; according to 


whether or not every Bayes solution with respect to immediately accepts H 





68 J. KIEFER AND LIONEL WEISS 


with probability one. Clearly (see p. 121 of [3]) V; is in the boundary of C; for 
7 = +1, and a line segment VoP (of positive length) of the line & = & is the 
intersection C, n C, ; the curve VoPV;, is the boundary of C; ; on the segment 
VoP, except at the point P where one may randomize among accepting H, or 
H_, or taking another observation, a Bayes solution must stop with probability 
one and randomize between accepting H, or H_, (analogous to Theorem 4.9 
of [3]). Of course, a necessary condition for a Bayes solution minimizing (4.2) 
to be optimum (for some a?) is that it stop with probability one whenever 
e” = V, or V_, ; we hereafter consider only Bayes solutions of this nature. 

In order to obtain a more detailed characterization, we now introduce certain 
assumptions. Write x°” = (a, , --- , tm) and pin(x”) = filx:) fre) ---+ film) 
form > 0 and =1 for m = 0. 

AssumpTION A. For each m and x”, y”, if 


(4.3) Din(t”)p_inly”) = pin(y”)p-in(x”), 
then 


m) 


(4.4) Pom(x”)p_in(y”) = Pom(y” )p-im(2”) 
and 
(4.5) Pim(x'”)pom(y”) = Prim(y”)Pom(x™”’); 


and strict inequality in (4.3) with both sides positive implies strict inequality 
in (4.4) and (4.5). 
AssuMPTION B. For each 2;, if 


(4.6) fi(ai) = f-r(m), 
then 

(4.7) fo(t1) = falar); 
and, if 

(4.8) f(t) 2 filx1), 
then 

(4.9) fo(a1) 2 fila). 


ASSUMPTION C: 


.  min[pin(z”), pim(x”)| 
ae =a Dom(™) 


=0 


where the supremum is taken over those x‘”’ for which the denominator is positive. 

Of course, if there is a value z for which f_4(2:) = fi(ai) = fo(zi) > 0 (which 
wil! usually not be so), then Assumption B follows from Assumption A. Assump- 
tions A and B are related to the monotone likelihood ratio assumption which 
occurs elsewhere in certain fixed-sample-size problems in statistics (e.g., [5]), 





PROBABILITY RATIO TESTS 69 


but are not quite in that form, which (for a parametric class: put 6; = 0, +1 
in our case) states (for n = 1) fo,(x)fo,(y) = fo,(x)fo,(y) if 2 6,2 2 y. In 
Assumption A the z'” are not necessarily simply-ordered (although a simple 
ordering of certain equivalence classes for the purpose of the present discussion 
will often be obvious), unlike the monotone likelihood ratio case; and, at least 
for fixed n, it is easy to see by examples that neither of Assumption B and the 
monotone assumption implies the other. 
Before proceeding to the consequences of Assumptions A, B, and C, we note 
that they will be satisfied in many important cases (see also Remark 12 below): 
Lemma 4.2. If fi(x) = f(z, 0;) (i = 0, +1) with 0.1 < 0 < 0 (the inequalities 
may be reversed), then Assumptions A, B, and C are satisfied if f(x, 0) is (for 
example) of any of the following forms: 
e : > 
(1) f(z, 0) = 
0, 


= Lebesgue measure); 
jo", O<2<80 
(2) f(z, 0) =< 
0 otherwise 


(0< 6 < ©, pu = Lebesgue measure); 


(3) f(z, 0) = r(0)e" (Koopman-Darmois) 


(u any o-finile measure not giving all measure to one point, r (0) = fe du(zx), 
6 any value for which r*(0) < ~). 

We remark that the case f(r, 0) = $(0)e"'™ with g strictly monotone can be 
reduced to case (3). 

Proor: Cases (1) and (2) are easy to verify directly (note that the last part 
of Assumption A is vacuous here). In case (3), we have Il f(z; , 0) / f(y, @) = 


Oz 


e*° where z = ay (x; — y,); Assumption A follows at once. Next, we note 


(differentiating under the integral sign where necessary, which is easily justified 
for any 6 e L, where L is the interior of the interval of values @ for which r'(@) < 
~ ) that, for @¢ L, we have d’ log r(@) / dé’ = (E»X)* — EeX* < 0, where Eog(X) 
denotes the expected value of g(X) when X has density f(z, @). Hence, —log r(@) 
is strictly convex over the interval of @ for which r‘(@) < «. 

Putting f(z) = f(x, 0;) with 6.1 < % < 6, equation (4.6) is equivalent to 


1 r(6;) 


4.11) —___— lo > 
( A, = 6_1 6 (61) a 


—T, 

while (4.7) is equivalent to the expression obtained from (4.11) by substituting 
6 for 6;. Hence, we will have shown that (4.6) implies (4.7) if we show that 
q(@) = (@ — 6.4) logr(@) /r(@_,) is monotonically nonincreasing in @ for 
6 > 6.,. Thus, it suffices to show, for 6 > 6_,, that b(@) < 0, where 6(@) = 
(@ — 6.y)°dq(@) /d@ = [—logr(@) / r(6_1)] + (@ — 0.4)d log r() / d@. Since 





70 J. KIEFER AND LIONEL WEISS 


b(@_,) = 0, it suffices to show, for 6 > @_, , that 0 => db(6) / dé. Since db(@) / dé = 
(@ — 6.4) & log r(@) / dé’, the desired result follows from that of the previous 
paragraph. The proof that (4.8) implies (4.9) is similar (or may be obtained 
from the preceding argument by replacing @ and « by —6@ and — 

Finally, let p be defined by 6) = p#: + (1 — p)@_,. Clearly, 0 < p < 1. Be- 
cause of the strict convexity of —log r(@) we have, for some number / with 
0 < h < 1 and for all z, 

A few remarks can be made about restricted product problems in general. They 
are mainly consequences of the fact, that the risk function of 6 = (6’, 6”) is given 
by 


(6.2) R;(@) Rs (0) + Rea(8) 
and are also valid for the slightly more general case 
R;(0@) a= ples (0) + (1 —_ p) Ra 0). 


(i) Bayes solutions. We me sntion first the following result, which was previously 
noted by Duncan [3]. Let 6) and 6) be Bayes solutions of two component problems 
with respect to a common a priori distribution J, and suppose that (6 , 45) is 
compatible with the given set of restrictions. Then (4, , 6; ) is a Bayes solution 
with respect to \ for the restricted product problem. For let (6) , 6; ) be any other 
compatible procedure. Then 


| Rs, (0) dX(6 ys | Rio ) dr(@); | Ry(0) dX(6) < | R;(0) dd), 


and the result follows from (6.2). 

(ii) Minimax procedures. Let 6) and 69 be minimax solutions of the component 
problems, and suppose that the same sequence of a priori distributions }A,,! is 
least favorable for both so that 


sup R,,(6) = lim | R;, (@) dd, (8), 


sup Ry-(@) = lim | Ry (0) dd, (6 


where 6,, and 6,, denote any Bayes solutions with respect to A, . Then if 
compatible with the given set of restrictions, the procedure 6 (5 
minimax solution of the restricted product problem. 

To prove this we note as a consequence of the minimax property tha 


sup [R;-(@) + R,-(6)] < lim | [Ry (0) + Re (6)| dd, (8), 


CORRECTION 
THE ANNALS OF MATHEMATICAL STATISTICS 
Vol. 28, No. 1 March 1957 


Page 14, formula (4.12) through page 17, line 23 should be exchanged with page 70, line 8 
through page 72, next-to-last line. 





PROBABILITY RATIO TESTS 


while also 


sup ik; (9) > R5-(6)] Se | e@ -+- R-(8)] dr,(8) 


—< [ irs, (0) + Rye (8)} dx,,(9 ‘ 


sup [R;.(0) + Rs-(6)) = lim | [Rs (0) + Roe (0)] dd, (8), 
which is a sufficient condition for 69 to be minimax. 

(iii) Procedures with uniformly minimum risk. Let @’ and @” be classes of deci- 
sion procedures for the two component problems, and suppose that within these 
classes the procedures 6) and 69 respectively have uniformly minimum risk. Let 
© be the class of compatible pairs (6’, 6”) with 6’ « @’ and 6” ¢ @”. Thenif (69 , é¢ 
is compatible and hence belongs to ©, the procedure 4) = (8 , 6) uniformly 
minimizes the risk within the class @. This follows from the fact that if 6; , 67 
(7 1, 2) are such that R;,(8) S R;j,(0) and Ry(6) s Rs(8) for all 6, and if 
6, = (6, , 8.) are both compatible, then R;,(6) S R;,(0) for all 6. This result 
again extends immediately to the case of infinite products. 

(iv) Unbiasedness. In an earlier paper the author defined a decision procedure 
6 to be unbiased if it satisfies 


(6.3 E,W (0, 6(X)) = E,W(@, 6(X)) 


for all @, 6’. For the type of problem with which we are concerned this means 
roughly that on the average the actual decision is closer to the correct decision 
than to any false one. In this sense the condition is an expression of the require- 
ment that the decision procedure should not favor any one parameter value, or 
any subset, at the expense of all others, but that it should be impartial towards 
the various values the parameter can take on. Without some such restriction 
minimization of the risk will not lead to acceptable results since the procedure 
that without regard to the data takes the constant decision d:@ ¢ Q; clearly 
minimizes the risk for @ ¢ Q; . As was shown in [9], condition (6.3) reduces to the 
usual condition of unbiasedness in the case of hypothesis testing and point es- 
timation for suitable loss functions. 

If 6’ and 6” are unbiased, it follows from addition of the associated inequalities 
(6.3) that the same is true for the product procedure 6 = (6’, 6”). More generally 
consider products of a family of decision problems with decision spaces D. and 
loss functions W, , y ¢ I, where the risk function of a product procedure 6 with 
components 6, is given by 


R,(@) = | R;.(@) duly). 


‘Then again the unbiasedness of each 6, implies that of 6. 





72 J. KIEFER AND LIONEL WEISS 


The converse, that unbiasedness of a product of two procedures implies that of 
the components, is not true in general. However, it does hold for example if 
6 = (t, n), W’(6, d’) = U(é, d’), W”(0, d”) = V(n, da”), and if the parameter 
space 2 is such that given any points ~ and ’ in the projection of 2 onto the 
t-axis, there exist two points in 2 with abscissae — and ~’ respectively and with 
common ordinate 7, and if the corresponding condition holds with £ and » inter- 
changed. For putting 6’ = (£’, n) and @ = (é, n), we have 


E, U(t', &(X)) + EeqV(n, 8"(X)) = Beg (E, 6'(X)) + Be V(n, 87 (X 
and hence 
BE; ,U(t'’, 6 (X)) 2 BE: UE, U(X 


Therefore 6’ is unbiased, and analogously also 6”. The above condition on © is 
satisfied in particular when 2 is a direct product, but also for example in a tri- 
nomial (or more generally multinomial) situation with § = p, , 7 = peand Q the 
triangle defined by 0 S p, , po and pm, + m S&S 1. 

If this condition holds, and if 5 , 65 are unbiased procedures with uniformly 
minimum risk for the component problems and (8 , 6g ) is compatible, it follows 
from (iii) that (4 , 6) ) is unbiased with uniformly minimum risk for the product 
problem. In the next section we shall give a much weaker condition on the struc- 
ture of the decision problem, for which a similar conclusion holds. 


7. Unbiasedness. We now return to the multiple decision problems of Sec- 
tion 2, which were obtained as restricted products of the problems of testing 
H,:0ew,,7yeT. The losses resulting from false rejection and acceptance are 
assumed to be a. and b, respectively, so that the risk for the testing problem is 


a,Eg¢,(X) for Oew. 


Y 


(7.1) R,_(0) 
b,El — ¢(X)] for @ew,, 


which may be written in a single formula as 


Y 


(7.2) Ry,(@) = 3(x, + 1)a,Eog,(X) — 2a, — 1)b,Ee¢7 (X) for @ew7 


The risk of the product procedure is therefore 


(7.3) Ry(6) = Ke a [2(tiy + L)aye,(X) — 3(ai, — 1)b wy (X)I, 


when 6 ¢ 2; = f,w;'’ and the 2’s are defined as in (2.1). 

The purpose of the present and following sections is to prove that all of the 
procedures described in Sections 3 to 5 are unbiased, and among all unbiased 
procedures possess uniformly minimum risk, when the loss function is given by 
(2.4). The result is independent of the weight function », provided in the case 
with infinite I, » is equivalent to Lebesgue measure in the sense of mutual abso- 
lute continuity. It is however valid only within the class of procedures the risk 
of which is finite for the chosen ug. 

5. The methods used herein (especially the geometric type argument of 


CORRECTION 
THE ANNALS OF MATHEMATICAL STATISTICS 
Volume 28, No. 1 March 1957 


Page 14, formula (4.12) through page 17, line 23 should be exchanged with page 70, line & 
through page 72, next-to-last line. 





PROBABILITY RATIO TESTS 73 


Theorem 3 and Lemma 4.3) may also be usefully applied to obtain structure 
theorems in other sequential decision problems under suitable regularity con- 
ditions. For example, the stopping rule for any Bayes solution (risk = expected 
loss + En) for the k-decision problem of choosing which of 6; < @ < +-- < & 
is the true parameter value when the X; have the density f(z, @;) of (3) of 
Lemma 4.2 for some j is easily seen for large n to approach that which says to 
stop if and only if each of k — 1 certain SPRT’s of 6; against 0;,, (1 S 7 < k) 
says to stop. 

6. For practical use, our results may be put into more convenient form. 
For example, if f; is for some A > 0 the normal density with mean Aj and unit 
variance, our results say that there are constants A; > A, > --- > Ay. > 1 
such that the (essentially unique) procedure with type I and type II errors a 
which minimizes Ao(é) stops the first time |>-? X;| = A, (making the ap- 
propriate decision) and never takes more than N observations (assuming a first 
observation is taken with probability one). Similar characterizations in the 
space of the range of the sufficient statistic may be made in other cases of Lemma 
4.2. 

7. For given a? , it is interesting to consider Ao(6*) where 4* is the SPRT with 
a,(6*) = at (which minimizes Em for 7 = +1). Let M = M(a*,, at) be the 
smallest integer such that (4.1) may be satisfied by a fixed-sample-size procedure 
6 requiring M observations. It is easy to give examples where Ao(é*) < M (e.g., 
let fo be close to f_, or f,) and where M < Ao(é*) (in the example of Remark 6 
above, as a — 0, Ao(6*) is of order (log a)’ > M(a, «)). It would be interesting 
to obtain useful inequalities and limiting formulas for e(a_1, a), as well as 
e(a_,, a) / Ao(6*) and e(a_; , a1) / M(a_; , a), analogous to those which can be 
obtained in sequential analysis [6]. Of course, if for each ¢° one has a knowledge 
of an upper bound on N, one can compute the procedures of Theorem 4 (for all 
a) by “working backwards” as in [3], [4]. Without investigating these topics 
further, we mention an interesting suggestion of Wolfowitz (who is also to be 
thanked for suggesting the problem of this section): There is in Case (3) of 
Lemma 4.2 with » equivalent to Lebesgue measure a one-parameter family 
C of tests of the form “‘stop the first time there is a violation of the inequality 
h, + Sn < >I X; < he + Sn(h,, bh, S constants)” and which satisfy a; = 
a: (i = +1). One of these other than the unique SPRT 6* of f, against f_, which 
is a member of C may minimize Ao(5) among members of C and may reduce 
A,(6) considerably from its values for 6*. Investigation now being undertaken 
shows that this improvement may be appreciable in practical examples and 
can often be achieved without modifying Him greatly for i = +1. We also 
remark that truncated SPRT’s will often be much better than untruncated 
SPRT’S (e.g., in the example of Remark 6 for @ small) in making Ao(é) small 
subject to (4.1); some data on this are available in (e.g.) [7]. These remarks 
apply also to 9 and 10 below. 

8. Our results may be extended in an obvious fashion to consideration of 
minimizing Ao(6) subject to (4.1) for continuous time processes [8]. 





74 J. KIEFER AND LIONEL WEISS 


9. One may obtain a result similar to that of our Theorem 4 for the problem 
of minimizing subject to (4.1) a (probability) average of sen over a set of 6 
between 6_, and @, in the cases of Lemma 4.2 (see also Section 3). This corre 


sponds to the practical situation where @ may be thought of as having a known 
probability distribution (e.g., certain industrial problems). 


10. In any of the cases of Lemma 4.2, one can obtain results on the problem 
of minimizing sups_,<s<e,4sn subject to (4.1). This can be done by obvious 
application of the Bayes technique, using Remark 9. In some cases it will be 
easy to guess at a value 6 (6, < 6 < 6,) such that a procedure minimizing 
A,(5) subject to (4.1) has its maximum Fgn at @ = 4). This procedure will then 
clearly minimize supe Hn 

11. Results like those of Theorem 4 and Remarks 9 and 10 can also be ob- 
tained if a restriction of the type Em = e,(i t1) is imposed in addition to 
(4.1). 

12. Lemma 4.2 can easily be extended to include many other cases, e.g., 
many cases arising in simple fashion from those of Lemma 4.2. For example 
Lemma 4.2 also holds for f(x, 6 (t+ ijx'a ‘if 0 < 2 < @ (and 0 other 
wise), where t > —1. 


REFERENCES 

. Wess, ‘Testing one simple hypothesis against another,’’ Ann. Math. Siat., Vol. 24 

(1953), pp. 273-281. 
LeCam, ‘‘Note on a theorem of Lionel Weiss,’? Ann. Math. Stat., Vol. 25 (1954), pp. 

791-794. 

. WaLD, Statistical Decision Functions, John Wiley & Sons, 1950 

. WALD AND J. WoLrowi71z, ‘‘Bayes solutions of sequential decision problems,’’ Ann. 
Math. Stat., Vol. 21 (1950), pp. 82-89. 

. Rusin, ‘‘A complete class of decision procedures for distributions with monotone 
likelihood ratio,’’ (abstract) Ann. Math. Stat., Vol. 22 (1951), p. 608 

A. WALD, Sequential Analysis, John Wiley & Sons. 

. E. Becunorer, J. Kigerer, aNd M. Soset, ‘‘A Sequential Multiple Decision Pro- 
cedure for Certain Identification and Ranking Problems,’’ to be published. 

. Dvorerzxy, J. Kierer, AaNp J. WoLrow!7z, ‘Sequential decision problems for proc- 
esses with continuous time parameter. Testing hypotheses,’’ Ann. Math. Stat., 
Vol. 24 (1953) pp. 254-264. 





THE MINIMUM DISTANCE METHOD: 
By J. WoLrowi1tTz 
Cornell University 


1. Summary and Introduction. The present paper gives the formal statements 
and proois of the results illustrated in [1]. In a series of papers ([2], [3], [4]) the 
present author has been developing the minimum distance method for obtain- 
ing strongly consistent estimators (i.e., estimators which converge with prob- 
ability one). The method of the present paper is much superior, in simplicity 
and generality of application, to the methods used in the papers [2] and [4] cited 
above. Roughly speaking, the present paper can be summarized by saying that, 
in many stochastic structures where the distribution function (d.f.) depends 
continuously upon the parameters and d.f.’s of the chance variables in the struc- 
ture, those parameters and d.f.’s which are identified (uniquely determined by 
the d.f. of the structure) can be strongly consistently estimated by the mini- 
mum distance method of the present paper. Since identification is obviously a 
necessary condition for estimation by any method, it foliows that, in many 
actual statistical problems, identification implies estimatability by the method 
of the present paper 

Thus problems of long standing like that of Section 5 below are easily solved. 
For this problem the whole canonical complex (Section 6 below; see [1]) has 
never, to the author’s knowledge, been estimated by any other method. The 
directional parameter of the structure of Section 4 seems to be here estimated 
for the first time. 

As the identification problem is solved for additional structures it will be pos- 
sible to apply the minimum distance method. The proofs in the present paper 
are of the simplest and most elementary sort. 

In Section 8 we treat a problem in estimation for nonparametric stochastic 
difference equations. Here the observed chance variables are not independent, 
but the minimum distance method is still applicable. The treatment is incom- 
parably simpler than that of [4], where this and several other such problems are 
treated. The present method can be applied to the other problems as well. 

Application of the present method is routine in each problem as soon as the 


identification question is disposed of. In this respect it compares favorably with 
the method of [4], whose application was far from routine. 

As we have emphasized in [1], the present method can be applied with very 
many definitions of distance (this is also true of the earlier versions of the mini- 
mum distance method). The definition used in the present paper has the con- 
venience of making a certain space conditionally compact and thus eliminating 


the need for certain circumlocutions. Since no reason is known at present for 


Received January 31, 1956 
> 


tesearch under contract with the Office of Naval Research. 


io 





76 J. WOLFOWITZ 


preferring one definition of distance to another we have adopted a convenient 
definition. It is a problem of great interest to decide which, if any, definition of 
distance yields estimators preferable in some sense. The definition of distance 
used in this paper was employed in [9]. 

As the problem is formulated in Section 2 below (see especially equation (2.1), 
the “observed” chance variables {X;} are known functions (right members of 
(2.1)) of the “unobservable” chance variables {Y;} and of the unknown con- 
stants {@,;}. In the problems treated in [3], [9], and {11}, it is the distribution of 
the observed chance variables which is a known function of unobservable chance 
variables and of unknown constants, and not the observed chance variables 
themselves. However, the latter problems can easily be put in the same form 
as the former problems. Moreover, in the method described below the values 
of the observed chance variables are used only to estimate the distribution 
function of the observed chance variables (by means of the empiric distribution 
function). Consequently there is no difference whatever in the treatment of the 
problems by the minimum distance method, no matter how the problems are 
formulated. 

The unobservable chance variables {Y ;} correspond to what in [11] are called 
‘incidental parameters’’; the unknown constants {@;} are called in [11] “‘struc- 
tural parameters’’. In [9] there is a discussion of the fact that in some problems 
treated in the literature the incidental parameters are considered as constants 
and in other problems as chance variables. In contradistinction to the present 
paper [3] (in particular its Section 5) treats the incidental parameters as unknown 
constants. The fundamental idea of both papers is the same: The estimator is 
chosen to be such a function of the observed chance variables that the df. of 
the observed chance variables (when the estimator is put in place of the param- 
eters and distributions being estimated) is “closest”? to the empiric d.f. of the 
observed chance variables. The details of application are perhaps easier in the 
present paper; the problems are different and of interest per se. 


2. The minimum distance method. Let m, m’, k, k’, be integers such that 
Osmsm,08Sk Sk’. Forj = 1, 2, --- ad inf. let (Yj; ,--- , Yj) be inde- 
pendent, identically distributed vector chance variables with the common d.f. 
Go which is unknown to the statistician. The constants 6 ,--- , 4,” , are also un- 
known to the statistician. It is known that, for 7 = 1, 2, --- ad inf., 


(2.4) Xx = BOT ng #9 oe ky 90 Car) Bp oes oil 


where the ¢; , fori = 1, --- , h, are known Borel-measurable functions of the argu- 
ments exhibited. Define the common d.f. of (Yj,---, Ya), 7 = 1, 2,--+> ad. 


5 mys 


inf., by 
Gly, +++ 5 Ye) = Golly, ++, Ye, +% 


Let 6 = (%,--- , Om). Let A = {(&, g}) be a space of couples (&, g), the first 
member of which is a real m’-dimensional vector (a; , --- , am), and the second 





MINIMUM DISTANCE METHOD 77 


member of which is a k’-dimensional d.f. It is known that (@, Go) is in A. On A 
we define a metric 6 as follows: 
o 
8([&, gi), [a2, gel) = >> | are tan a; — are tan a; 


j=l 


(2.2) 


+ / | gi(z) — go(z) | e7'*! dey +++ day: 


where 


We shall also use 6 to denote a metric on any Euclidean space or on any space of 
d.f.’s of the same dimensionality. In that case 6 is to be understood as the ex- 
pression corresponding, respectively, to the first or second term of the right 
member of (2.2). 

Our problem is to give (strongly) consistent estimators of G and (4, , --- , Om), 
i.e., for n = 1, 2, --- ad inf., to construct measurable functions (Qa; , Qn2) from 
hn-dimensional Euclidean space (of Xn ,---, Xu,---, Xn,°-+, Xn) to A 
such that, whatever be (@, Go) (in A), we have, with probability one (w.p. 1), both 


(i) 
a —» 6, 
where Q‘) is the jth component of Q,: , and 
Qn » Uk; +o, tee, +x)—9G(m, wae » Ye) 
at every point of continuity of the latter. 

Let J(&, g) be the (h-dimensional) d.f. of (X,,---, Xj) when 6 = & and 
Go = g. In this notation the generic point in h-space is suppressed since it will 
rarely come explicitly into play, and the emphasis is on the fact that this is a 
transformation from A into the space of h-dimensional d.f.’s. We shall make the 
following Identification and Continuity (I.C.) Assumption: 

Let {&;, gi} be any Cauchy sequence (i.e.,as i — ©, 8{&; , gil, [&isn, Gitn]) 0 
uniformly in n) such that 
2.3) 5(J (a: , gi), J(@, Go)) +0 
asi— «. Then, asi— ~, 

(2.4) aij 6; 9 

(a,; is the jth component of &;) and 

2.5) Qlth,°** > Ya, tTe,°*°, +e) Gly, --: 
at every point of continuity of the latter. 


Let 


Cc. = \(Xan,-°- ~ an) d Se 





=O 


id J. WOLFOWITZ 


and F,,(C,,) be the empiric d.f. of C,, 1e., h-dimensional d.f. such that its 
value at (¥%,--:-, ya) is 1/n times the number of elements in C, for 
which Xj; < y;, 7 = 1, --- , A. Let y(n) be any positive function of n which ap 
proaches zero as n — ~. Let S,(C,) = (0% ; G?.) be any function from the hn 
dimensional Euclidean space of C,, to the space A which is measurable and such 
that 


e * * , ' . . e , ’ 
(2.6) RIC, . Gea), ParCs ‘ inf 6(/(a@, 9), Fn(Cn)) 
ate a, g)eA 
S,(C,,) is a minimum distance estimator, for which the following holds: 
THEOREM. If the I.C. Assumption holds, then, with probability one (w.p. 1), 
9 7\ * 
(2.7) On; — 9;, ) l, 
es ‘ — 
(6;,; ts the jth component of @,) and 


P y* 
(2.8) Gon(Yi,*** > Yk» +, 


at every continuity point of the latter. 

(In view of (2.7) and (2.8) it is actually the appropriate components of S,(C,) 
which could be called minimum distance estimators of the 6; and G@). 

Proor: By the Glivenko-Cantelli theorem we have that, w.p.1, 


(2.9) 6(F,.(C,,), J(0, Go)) — O 


asn — ©. Hence, w.p.1, 


(2.10) 6(F',.(C,), J(&, g)) 2 0 


Hence, w.p.1, 
(2.11) 5(J (0%, Gen), J(0, Go)) > 0. 


Since the space A is (sequentially) conditionally compact (with respect to the 
metric 6) it follows that, at every sample point and from every subsequence of 
S,(Cr), nm = 1, 2,--+, we can select a Cauchy subsequence. For every such se- 
quence the relation corresponding to (2.11) holds, except on a set of sample 
points of probability zero. When the relation corresponding to (2.11) holds, 
then, by the I.C. Assumption, relations corresponding to (2.7) and (2.8) hold. 
Thus, we have proved that, except on a set of sample points of probability zero, 
every subsequence of S,(C,,) contains a subsequence for which the equations cor- 


responding to (2.7) and (2.8) hold. But this easily implies the theorem. 


3. Discussion of the Identification and Continuity Assumption. We have 
seen that the proof of the strong consistency of the minimum distance estimator 
follows almost trivially from the I.C. Assumption. Let us now examine this as- 
sumption more carefully. 

The constants (@,,--- , 6,) and the d.f. G, which belong to the “structure”’ 
(system) (2.1), are said to be “identified in A”’ if, when (&, g) is in A, and 


(3.1) J(0, Go) = J(& 9) 





MINIMUM DISTANCE METHOD 


identically in the h arguments, then 
(3.2 6, 
Gly, -°* , Ye) = Gr, °° > Ye, +, °°? 


i 


identically in y; , --- , ye (of course (6, Go) is in A). It is obvious that identifica- 
tion in A is an indispensable condition for our problem of estimating consistently 
the constants 6, ,--- , 6, and the d.f. G, for na particular value of the sequence 


'((Xn,°°°,X~sp),j = 1,2,°°: } 


can furnish more information than the function J(6, Go) itself. 
In most, if not all, actual statistical problems, J will be a continuous function 
on A, 1.e., whenever 


Qi, gi) — (&, g) in A, 


6(J (a, gi), J(&, g)) — 0. 


We shall assume that this is so in the remainder of this section. Then the follow- 
ing considerations will help to understand the I.C. Assumption and to furnish a 
convenient way of proving that it is satisfied. 

Let C; be the map of A under J, i.e., 


Ci = {J(&, 9), (&, g)eA}. 


/ f . ° ° — 
Let {a@;, gi} be any Cauchy sequence in A which does not have a limit in A, 
and for which 


yy 5 - ’ 
»9i}) = lim J(&; , gi) 


‘os 


/ 
‘ 


J ({& 


. : r ’ 5 %; 
exists. Let C2 be the totality of all such J({@;, g;}). (C; and C, need not be 
disjoint ). 

The indispensable condition of identification” in A may be stated as follows: If 


(&; , gi) — (@, g) in A 


J(&:, gi) — J(6, Go) 


ih, °** 51 t . tee. -o)= Gliy,-:: 


identically in y.,--- , ye. If either of the two following conditions is also met 
the I.C. Assumption is fulfilled: 


2 We remind the reader that J is assumed to be continuous on A. 





80 J. WOLFOWITZ 


A) J(@, Go) is not a member of C, 

B) If J(@, Go) = J(La, q:}) is’ in C., then Qe; j — 6;,j = 1,---,m, and 
9:(Yr yt Yes He, +, $0) > G(y,, +--+ , ye) at every point of con- 
tinuity of the latter. 

Thus the I.C. Assumption is, for most A to be encountered in actual problems 
where J is continuous, not much more onerous than the indispensable identifica- 
tion condition. In the important examples to be discussed below condition A, 
and hence the I.C. Assumption, will hold. 


4. A linear relationship between two chance variables subject to independent 
errors. We illustrate the last two sections by application to the following very 


important structure: Suppose it is known to the statistician that,forj = 1,2,---, 
ad inf., 


(4.1) Xa = & + vp 
(4.2) Xj = a + BE; + vp 


where a and 8 are constants‘ unknown to the statistician, and {vj}, {vp}, and 
{£;} are sequences of independent, identically distributed chance variables, with 
respective d.f.’s L, , L. , L; , say, which are unknown to the statistician. The dif- 
ferent sequences are known to be independent of each other. We shall consider 
first the problem of estimating 8. 

Let d be the generic designation of a complex 


la, b, Pi, Pe, Ds} 


whose first two elements are real numbers, and whose last three elements are 
one-dimensional d.f.’s. Let 


do = +a, B, Ty ’ Le ; L;} 


The symbol /(d) will have the same meaning as in Section 2. 

We shall assume that A is the totality of all d’s such that g; is not a normal 
d.f.; for the purposes of this definition and elsewhere in this paper a d.f. which 
assigns probability one to a single point is to be considered normal (with variance 
zero). It was proved by Reiersol [5] that 8 is identified in A ; actually an examina- 
tion of his proof (especially equation (4.5)) shows that Reiersol proved somewhat 
more, namely that, if 


(4.3) J(do) = J({a’, b°, pi , pe, pal) 


’ The preceding symbol has been defined in the first displayed equation which pre- 
cedes (3.5). 

4 This formulation does not include the case when the regression line is parallel to the 
axis of Xz, i.e., when 8 = ~;in that case X; = constant + » , X2 = & + ». This omis- 
sion is made only in the interest of simplicity. We invite the reader to verify that the for- 
mulation where this case is also a possibility can be treated by the methods of the present 
paper in exactly the same way as this is done in Sections 4, 5, and 6. 





MINIMUM DISTANCE METHOD 81 


where a’ and b’ are finite, p; and p: are d.f.’s, but the d.f. p3 is not required to be 
not normal, then b” = 8 and p$ must be not normal. Thus (a’, b’, pi, p2, ps) 
is in A. 

It is obvious that /(d) is a continuous function of d (on A). 

Let {d; = (a;,6:, pa, Pe, Ps), t = 1,2, +--+ } be any Cauchy sequence in A 
which does not have a limit in A and which is such that 
(4.4) J({di}) = lim J(d,) 

r 17s 
exists. Let d* = (a*, b*, pi : pe : ps ) be such that 6(d; , d*) — 0. Then at least 
one of the following four properties must hold: 

1) p: is a normal d.f., a* and b* are both finite, and pt} and p> have variation 
one. 

2) ps isa normal df., either a* = +2 or b* = + or both, and pj and p> 
have variation one. 

3) Ps is a non-normal d.f. (therefore has variation one), either a* = + or 
b* = + or both, and both pt and p? have variation one. 

4) the variation of at least one of pt , p2 , ps is less than one. 

We shall now show that the I.C. Assumption is satisfied in the present prob- 
ent problem. We shall try first to show that J (do) is not in C, ; we will be able to 
to achieve this except for one obstacle which we will treat somewhat differently. 
Suppose then that J(do) = J({d,}) were’ in C,. Then d* could not have the 
first of the above properties, because of Reiersol’s result cited above. If d* had 
property 2 above then either the variation of J({d;}) would be less than one 
(which cannot be true of J(do)) or else J({d;}) is the same as J(a”, b”, pi, p2, p3]), 
where a” and b” are finite, p3 assigns probability one to a single point and is 
therefore normal, and p; and p: have variation one (which cannot be true of 
J (do) because of Reiersol’s result cited above). If d* had property 3 above then 
J({d;}) would be of variation less than one, which cannot be true of J(do). If 
d* had property 4 above then either J({d;!) has variation less than one (in which 
case J (do) is not in C.) or else J({ds}) = J(do) is in C,! To see how this can hap- 
pen we note that, if z and 2’ are any real numbers, 


E& tun = (& +2) + (Va — 2) 
a + BE; + vp = (a + 2’ — Bz) + BCE; +z) + (Up — 2’). 


If either z or 2’ or both approach + » the variations of some or all of pt , p2 , ps 
will be zero. This difficulty is easily overcome. One can show that in this case 
condition B of Section 3 holds. A method which is essentially the same but for- 
mally simpler is the following: Without changing the problem or any loss of 
generality we can reduce the set A so as to prevent the occurrence of this case. 
We simply define A as the totality of all d’s which, in addition to the conditions 
previously imposed, satisfy the requirement that the smallest median of both p; 


5 The preceding symbol! was defined in (4.4). 





82 J. WOLFOWITZ 


and pz» is zero. It is clear that the estimation of 8 is not affected by this additional 
restriction, and that, under this restriction, J(d) cannot be in C,. (The defini- 
tion of dy , but not the value of 8, may be affected by this restriction). 

It is obvious that, unless the space A is suitably reduced, the parameter a 
cannot be identified. In [5] Reiersol states the result that, if the space A is that 
subset of the originally defined A = {d} where the d’s are subject to the further 
restriction 


4.5) median of p, median of p 0, 


then a is also identified on (the new) A. It seems to the writer that one must 
make precise which median is meant in order to make the proof of [5| go through. 
Either of the following conditions, for example, will permit the proot of [5| to 
go through: 


4.6) >} and pe each have zero as the unique median 
Pr I { 
(4.7) p, and pe have zero as the smallest (largest) median. 


What will suffice is a condition such that, if p:(x), po(x) are the third and fourth 
components of a point in A, p;(x + ¢), po(x + ce) cannot be the third and fourth 
elements of any point in A unless ¢; = c, = 0. 

Suppose, for example, one adopts the restriction (4.6) above. Then a is identi- 
fied on (the new) A, by the result of [5]. Let A be the totality of limit points of A 
which are not in A. A will include points whose third and fourth elements will 
not have zero as a unique median. In order to show, just as before, that J (do) 
is not in C., we need the additional result analogous to the one implicit in [5] 
about 6 and cited above, namely that, if J(d)) = J({d;!), then the first element 
of d* is a. However, this result does not seem to be implicit in [5] under the con- 
dition (4.6), and a stronger condition may be needed. 


5. A linear relationship between two chance variables whose errors are jointly 
normally distributed. The following structure is a very famous one with a long 
history of study (see [2] and [5], for example). Let it be known to the statistician 
that X , and X j» satisfy (4.1) and (4.2) respectively, that a and 8 are unknown 
constants,’ that the two sequences {£;} and {vq , vj2)}, 7 = 1, 2,---, ad inf., 


is ris J 9 & 


2/5 


of independent and identically distributed chance variables are distributed in- 


dependently of each other, and that the common distribution of {(v; , vj2)} is 


normal, with zero means and covariances o3(= E(v,)), on(= EF ), and 
j2)), unknown to the statistician. Designate the common unknown 
d.f. of (Ey; by L. 


Let m be the generic designation of a complex 


(a, b, Cir, Cia, C22, L) 


6 See footnote 4. 





MINIMUM DISTANCE METHOD 83 


such that a, b are real numbers, ¢y; , C2 , and C22. are real numbers such that the 
matrix 


Ci 


Cy 


12 


is non-negative definite, and / is a one-dimensional non-normal d.f. Let A be 
the totality of all m. It follows from the results of Reiersol ({5]) and the Cramér- 
Lévy theorem ({7], p. 52, Th. 19) that, if 


J(u) = J(m’) 
where 


a= iG, B, O11, C12, Om, L), 
0 0 0 0 0 0 ) 
m = (a,b,c, Cir, ¢2,0), 


u is of course in A, and m’ satisfies all the requirements imposed on the elements 
of A except that [' is not required to be not normal, then [’ must be not normal 
and a a,B=b. 

Obviously J(m) is a continuous function of m on A. We will show that condi- 
tion A of Section 3 is satisfied, so that the I.C. Assumption is fulfilled, and tie 
minimum distance estimator of a and @ is strongly consistent. 


Let A for the present problem be defined as in Section 4. If a point m” = 


00 


(a”, b”, ci, Ci2, C22, [”) is in A, and, for a sequence {m,} in A, m;-—> m- and 
J(m > J(im,}) in C2 , at least one of the following must be true: 
- . 00 00 — 
1) [ is a normal d.f., and ci; and ¢2 are finite 


9 yn ) 


2) [” isa df. which is not normal, and either a” = + orb” = + or both 
3) either ¢:) = © orcs: = © or both 

1) the variation of [” is less than one 
Suppose J(u) were in C.and = J({m,}). If the first of the conditions above held 
then either J({m,}) would be of variation less than one, or J({m;}) would be 
normal, neither of which can be true of J(u). If one of conditions 2, 3, and 4 held, 


then the variation of J({m,;!) would be less than one, which of course cannot be 
true of J(u). This completes the proof that J(z) is not in C.. 


6. Estimation of the remainder of the structure of Section 5. Let H(y) be 
any one-dimensional d.f. The Gaussian component of H is the largest value of 
\ for which H can be expressed as the convolution of a normal d.f. with variance 
\, and another d.f. H is said to have no Gaussian component if its Gaussian 
component is zero. 

The elements oi , o12 , 2 , L of uw are not, in general, uniquely determined by 
J(u). Among the, in general, infinitely many m such that J(m) = J(y), there is 
exactly one, say 





84 J. WOLFOWITZ 


which is such that Ly has no Gaussian component. We have o11 + o22 > ¢n + Cx 
for any other complex m such that J(m) = J(u) = J(u); all such complexes 
are readily determinable from yo. These remarks follow from (4.1), (4.2), and 
the results of Reiersol [5]. We shall call the complex yo “canonical” and estimate 
all its components in a strongly consistent manner. Of course a and § have al- 
ready been estimated in Section 5; the present method will also estimate them, 
inter alia. 

Let Z,,--- , Z, be any independent chance variables with the common df. 
H(z) and the empiric d.f. H,,(z). Let d(n) be any positive function defined on the 
positive integers such that d(n) — 0 as n— ~ and 


P{6(H(z), H,(z)) > d(n) for infinitely many n} = 0. 


There are many such functions; it is easy to verify that n~””’ is such a function, 
but this is a crude result. For H(z) continuous and one-dimensional and 6 the 
Fréchet distance between two d.f.’s there is available the sharp result of Chung 
[6], according to which the function 


("8 log =) : 
en 


is a function d(n) if 0 < ¢ < 2. 
Let 


; rea , , a » * 7.4 * 
(6.1) U(C,) = (a*(n), b*(n), cii(n), cie(n), c22(n), La) 


be any function from the 2n-dimensional space of C, to the space A = {m} which 
is measurable and such that 

(6.2) 6(J(U(C,)), Fa(Cn)) < d(n) 

and 


9 * / ee, : 
(6.5) C11(N) + C22(nN) os y(n) a Sup (Ci tT Ce2) 


where the supremum operation in the right member is performed over all m such 
that 


(6.4) 5(J(m), F,(Cn)) < d(n). 


When there is no m which satisfies (6.4) let U(C,) be defined in any manner pro- 
vided it is measurable. It will follow from the general considerations of the next 
section that 


(6.5) 6(U(C,), uo) - 0 


w.p.l. Thus the elements of U(C,) are strongly consistent estimators of the ele- 
ments of the canonical complex. 


7. The method of the maximum index. We shall now generalize the considera- 
tions of the preceding section. 


Consider the structure (2.1), and the totality of (&@, g) in A such that J(&, g) = 





MINIMUM DISTANCE METHOD 85 


J(&*, G*); call this totality 7(a*, G*). In every T(&, g) let there be defined a 
unique member called the canonical complex of T(&, g); we may denote this 
element by D(a, g). If (& , g:) and (a , ge) are such that J(& , gi) = J(&, ge), 
then we must have D(a& , g:) = D(a, g2). Suppose that there is defined on A a 
real-valued function ¥(&@, g) such that, whenever 


(7.1) (Qi, gi) — (&, g) in A, 


then 


(7.2) lim inf ¥(@, gi) < W(&, g) 


i-2 
and such that, whenever (@*, g*) is a canonical complex, 
(7.3) ¥(a*, g*) > ¥(&, g) 


for every other (&, g) in T(&*, g*). 
Let d(n) be any function defined on the positive integers such that d(n) — 0 
asn— © and 


(7.4) P{6(J(0, Go), Fan(Cn)) > d(n) for infinitely many n} = 0 


Let U(C,) = (6%*, G32) be any function from the hn-dimensional Euclidean 
space of C,, to the space A, which is measurable and such that 


5(J(6%*, Ge*), Fa(Ca)) < d(n) 


¥(0%", Gon) + y(n) > sup Wa, g) 
where the supremum in the right member is over all (&, g) such that 


(7.7) 5(J(&, g), Fn(Cn)) < d(n). 


When there is no (&, g) which satisfies (7.7) let (6%*, Go) be defined in any 
manner provided it is measurable. We will call U(C,) a maximum index esti- 
mator (of D(@, Go)) and prove the following 

THEOREM. Jf J is a continuous function on A, 7.e., whenever (&; , gi) — (&, g) 
in A, J(&:, gi)  J(&, g), and if J(0, Go) is not in C2 , then, w.p.1, 


(7.8) 6(U(C,), D(6, Go)) oe 0, 
so that U(C,) is a strongly consistent estimator of D(@, Go). 
Proor: Obviously 5(J(6%*, G2), Fn(Cn)) — 0, w.p. 1. Hence 
5(J(0%*, Gos), J(D[@, Gol)) — 0, w.p.1. 


If (7.8) were not true w.p.1, then, with positive probability, we may choose a 
Cauchy subsequence (A is conditionally compact; the particular sequence may 
depend upon the sample point in the probability space) which converges to a 
point (&@’, g’) in A (since J(@, Go) is not in C2) and (a’, g’) is not D[@, Go). 





86 J. WOLFOWITZ 


It is impossible that, with positive probability, 
(7.9) ¥(D[8, Go]) > Wa’, 9’), 


because of (7.2) and the fact that, w.p.1, 6(J(D[@, Gol), F.(C,)) is eventually 
less than d(n). 
Suppose that, with positive probability, 


(7.10) ¥(D[6, Gol) < ¥(&’, g’). 


Then (&’, g’) would not be in 7(@, Go) and J(a’, g’) and J(@, Go) would not be 
identical. Since J is continuous we must have that, for the Cauchy subsequence, 
lim;.. J(0%* : Gon;) = J(&’, g’). Since 5(J(0%*, Ges), J(@, Ge)) + 0 w.p.1, it 
follows that J(&@’, g’) and J(@, Go) are identical, contradicting the above. Hence 
(7.10) cannot occur. 


Suppose then that, with positive probability, 


(7.11) ¥(D[6, Go]) = ¥(a’, 9’) 


Y 


but (@’, g’) were not D(@, Go). Because of the maximizing property of y (on each 7’) 
it would follow that (@’, g’) is not in 7(6, Go). But then J(@’, g’) and J(6, Go) 
could not be identical. We have already seen that this cannot be. This leaves, 
as the only remaining possibility, that (@’, g’) is D(@, Go), a contradiction which 
proves the theorem. 

It is easy to verify that the postulated conditions are verified in the problem 
of Section 6. We have already seen that there J(@, Go) is not in C. . Let 


y(a, b, Cu, C12, Cor, tl) = Cu + Ce. 


Then, in any 7(m), ¥ attains its unique maximum on the canonical complex. 
The function y is obviously continuous on A. Thus it satisfies the requirements 
of the theorem of the present section. 


8. Application to stochastic difference equations. Let it be known to the 
statistician that uw, um, Ww, --- are independent chance variables with the com- 
mon one-dimensional d.f. G, which is unknown to the statistician. Also ii is 
known that, for 7 = 1, 2, --- 


(8.1) X ; uj; + au 


i 


where a is a constant less than one in absolute value but otherwise unknown to 
the statistician. The problem is to estimate a consistently, under minimal as- 
sumptions on G. 

Let q be the generic designation of a couple (a, LZ), with a real and less than 


one in absolute value, and L a one-dimensional d.f. which does not assign prob- 
ability one to a single point. Let A = {q}. Let J(q) be the d.f. of (XY, , X2) when 


a = aandG = L. Let F, be the two-dimensional empiric d.f. of 


(8.2) {(Xoia, Xo), 7 L, 2, °** 


Finally let go = (a, @); of course, go is in A. 





MINIMUM DISTANCE METHOD 87 


If G were to assign probability one to a single point then it is obvious that a 
would not be identified. The necessary condition, that G not assign probability 
one to a single point, is also sufficient, and the d.f. of (X, , X2) then determines 
a(ja| < 1) uniquely. Even more: Let q’ be a couple (a’, L’), where |a’| <= 1, 
and L’ is a d.f. which does not assign probability one to a single point. Suppose 
that J(qo) = J(q’). Then a’ = a, hence is less than one in absolute value, and 
q’ must be in A. For it follows from Theorem 1 of [10] that, if a were not uniquely 
determined, G would have to be normal. The possibility that G is normal and 
a ~ a’ is then easily eliminated. (In [1] through an oversight it is erroneously 
stated that the d.f. of X, already determines a uniquely. Attention has been 
called to this error in, e.g., [8], page 211, footnote 6.) 

Although the members of the sequence (8.2) are not independent, the two se- 
quences made up of alternate members of this sequence are sequences of inde- 
pendent chance variables, and it is easv to show, as was done in [4], that 


8.3) 6(J (qo), F,.) — 0. 


The minimum distance estimator of a is obtained in the usual manner. Let 
* vk . : : ‘ 4 > sap - 
S, = (a, , G,) be any function from the 2%-Euclidean space of (X,, --- , Xen) 


to the space A which is measurable and such that 


(4) 5(J (ae, Gr), Fn) < inf 6(J(q), Pn) + y(n). 
qeA 
Then a* is a minimum distance estimator of a, to which it converges w.p.1. 
To prove the latter we have only to show that J(a, G@) = J(qo) is not in C2. 
Let A be as defined in Section 4. Any member g = (4, L) of A has one of the 
following properties: 


1) L assigns probability one to a single point. 
2) Lis a df. which does not assign probability one to a single point, and 
a= +1. 

3) L has variation less than one. 

Suppose J(q 


J(q). Then ¢ cannot have the first of these properties, be- 
cause then XY; = constant w.p.1. Also @ cannot have the second of these proper- 
ties, by the result described in the third paragraph of this section. If 7 had the 
third of these properties then either J(g) would have variation less than one or 
J(q) would assign probability one to a single point, neither of which can be true 
of J(qo). Hence J (go) is not in C2 . 

The author is grateful to Professors Henry Teicher and Lionel Weiss for read- 
ing the manuscript. 


REFERENCES 
[1] J. WoL_row171z, ‘Estimation of the components of stochastic structures,’’ Proc. Nat. 
Acad. Sci., Vol. 40 (1954), pp. 602-606. 
{2} J. WoL_row1tz, ‘‘Consistent estimators of the parameters of a linear structural rela- 
tion,’’ Skand. Aktuarietids. 1952, pp. 132-151. 


J. Wo.row17z, ‘‘Estimation by the minimum distance method,’’ Ann. Inst. Stat. Math. 
(Japan . Vol. 5 (19538), pp. 9-23. 





88 


J. WOLFOWITZ 


[4] J. WoLrowrv7z, ‘‘Estimation by the minimum distance method in non-parametric sto 
chastic difference equations,’’ Ann. Math. Stat., Vol. 25 (1954), pp. 203-217. 

(5) O. Rerersot, ‘‘Identifiability of a linear relation between variables which are sub 
ject to error,’’ Econometrica, Vol. 18 (1950), pp. 375-89. 

[6] K. L. Cuuna, ‘‘An estimate concerning the Kolmogoroff distribution,’ Trans. Amer 
Math. Soc., Vol. 67 (1949), pp. 36-50. 

(7] H. Cramér, Random Variables and Probability Distributions, Cambridge University 
Press, 1937. 

[8] M. Kac, J. Kierer, anp J. Wotrowrrz, ‘On tests of normality and other tests of 


goodness of fit based on distance methods,’’ Ann. Math. Stat., Vol. 26 (1955), 
pp. 189-211. 


(9] J. Krerer anp J. Wotrowirz, ‘‘Consistency of the maximum likelihood estimator in 


the presence of infinitely many incidental parameters’? Ann. Math. Stat., Vol. 27 
(1956), pp. 887-906. 


(10) H. Tercuer, ‘Identification of a certain stochastic structure,’’ Econometrica, Vol. 24 
(1956), pp. 172-177. 

{11] J. Neyman anp E. L. Scort, ‘Consistent estimators based on partially consistent ob- 
servations,’’ Econometrica, Vol. 16 (1948), pp. 1-32. 





STATISTICAL INFERENCE ABOUT MARKOV CHAINS 


T. W. AnpERSON AND Leo A. GoopMaAn! 
Columbia University and University of Chicago 


Summary. Maximum likelihood estimates and their asymptotic distribution 
are obtained for the transition probabilities in a Markov chain of arbitrary 
order when there are repeated observations of the chain. Likelihood ratio tests 
and x?-tests of the form used in contingency tables are obtained for testing the 
following hypotheses: (a) that the transition probabilities of a first order chain 
are constant, (b) that in case the transition probabilities are constant, they are 
specified numbers, and (c) that the process is a uth order Markov chain against 
the alternative it is rth but not uth order. In case u = 0 and r = 1, case (c) 
results in tests of the null hypothesis that observations at successive time points 
are statistically independent against the alternate hypothesis that observations 
are from a first order Markov chain. Tests of several other hypotheses are also 
considered. The statistical analysis in the case of a single observation of a long 
chain is also discussed. There is some discussion of the relation between likeli- 
hood ratio criteria and x?-tests of the form used in contingency tables. 


1. Introduction. A Markov chain is sometimes a suitable probability model 
for certain time series in which the observation at a given time is the category 
into which an individual falls. The simplest Markov chain is that in which 
there are a finite number of states or categories and a finite number of equi- 
distant time points at which observations are made, the chain is of first-order, 
and the transition probabilities are the same for each time interval. Such a 
chain is described by the initial state and the set of transition probabilities; 
namely, the conditional probability of going into each state, given the im- 
mediately preceding state. We shall consider methods of statistical inference 
for this mode] when there are many observations in each of the initial states 
and the same set of transition probabilities operate. For example, one may wish 
to estimate the transition probabilities or test hypotheses about them. We de- 
velop an asymptotic theory for these methods of inference when the number of 
observations increases. We shall also consider methods of inference for more 
general models, for example, where the transition probabilities need not be the 
same for each time interval. 

An illustration of the use of some of the statistical methods described herein 
has been given in detail [2]. The data for this illustration came from a “panel 
study” on vote intention. Preceding the 1940 presidential election each of a 
number of potential voters was asked his party or candidate preference each 


Received August 29, 1955; revised October 18, 1956. 
1 This work was carried out under the sponsorship of the Social Science Research Council, 
The RAND Corporation, and the Statistics Branch, Office of Naval Research. 


89 





90 T. W. ANDERSON AND LEO A. GOODMAN 


month from May to October (6 interviews). At each interview each person was 
classified as Republican, Democrat, or “Don’t Know,” the latter being a residual! 
category consisting primarily of people who had not decided on a party or 
candidate. One of the null hypotheses in the study was that the probability of 
a voter’s intention at one interview depended only on his intention at the im- 
mediately preceding interview (first-order case), that such a probability was 
constant over time (stationarity), and that the same probabilities hold for all 
individuals. It was of interest to see how the data conformed tg this null hy- 
pothesis, and also in what specific ways the data differed from this hypothesis. 

This present paper develops and extends the theory and the methods given 
in {1] and [2]. It also presents some newer methods, which were first mentioned 
in [9], that are somewhat different from those given in [1] and [2], and explains 
how to use both the old and new methods for dealing with more general hy- 
potheses. Some corrections of formulas appearing in [1] and [2] are also given 
in the present paper. An advantage of some of the new methods presented 
herein is that, for many users of these methods, their motivation and their 
application seem to be simpler. 

The problem of the estimation of the transition probabilities, and of the test- 
ing of goodness of fit and the order of the chain has been studied by Bartlett 
(3} and Hoel [10] in the situation where only a single sequence of states is ob- 
ierved; they consider the asymptotic theory as the number of time points 
snereases. We shall discuss this situation in Section 5 of the present paper, where 
a x’-test of the form used in contingency tables is given for a hypothesis that is 
a generalization of a hypothesis that was considered from the likelihood ratio 
point of view by Hoel [10]. 

In the present paper, we present both likelihood ratio criteria and x?-tests, 


and it is shown how these methods are related to some ordinary contingency 
table procedures. A discussion of the relation between likelihood ratio tests 
and x?-tests appears in the final section. 

For further discussion of Markov chains, the reader is referred to [2] or [7]. 


2. Estimation of the parameters of a first-order Markov chain. 


2.1. The model. Let the states be i = 1, 2, ---,m. Though the state 7 is 
usually thought of as an integer running from 1 to m, no actual use is made of 
this ordered arrangement, so that 7 might be, for example, a political party, a 
geographical place, a pair of numbers (a, b), etc. Let the times of observation 
bet = 0,1,---, T. Let p.;(t) (t, 7 = 1, ---, m;t = 1,---, T) be the proba 
bility of state 7 at time ¢t, given state 7 at time ¢ — 1. We shall deal both with 
(a) stationary transition probabilities (that is, p;,(t) = p.; fort = 1,---, T) 
and with (b) nonstationary transition probabilities (that is, where the transition 
probabilities need not be the same for each time interval). We assume in this 
section that there are n,(0) individuals in state 7 at t = 0. In this section, we 
treat the n,(0) as though they were nonrandom, while in Section 4, we shall 
discuss the case where they are random variables. An observation on a given 





MARKOV CHAINS 91 


individual consists of the sequence of states the individual is in at ¢ = 0,1,---, 
>" ; . san a <a 4 T 
T’, namely i(0), 7(1), 7(2), --- , 7/7). Given the initial state 7(0), there are m 

possible sequences. These represent mutually exclusive events with probabilities 


(2.1) Piwia) Piai@) *** Picr—1icr 


when the transition probabilities are stationary. (When the transition prob- 
abilities are not necessarily stationary, symbols of the form pj:1)i) should be 
replaced by pii:—s ic (t) throughout.) 


Let n,;;(t) denote the number of individuals in state i at ¢ — 1 and j at ¢. 
We shall show that the set of n,,;(t) (4, 7 = 1,---,m;t = 1,---, T), a set 
of m*T numbers, form a set of sufficient statistics for the observed sequences. 
Let niwia r) be the number of individuals whose sequence of states is 7(0), 
(1), --- , 2(T). Then 
(2.2) Ng ;(t) = >on id qT); 
where the sum is over all values of the 7’s with i(t — 1) = g and i(t) = j. The 


probability, in the nmT dimensional space describing all sequences for all n 
individuals (for each initial state there are n7’ dimensions), of a given ordered 
set of sequences for the n individuals is 


IT [Pi@iay(1) Piayi@(2) +++ Pirie (7 yy" oe i(7 
(IT {pis _— " i (II fp. T pan (T)]}* e(1) <aS?9 


(23) = (TD. proan ayrrrrns prs AL | pice-vien (TYP ”) 
IT IT ps @**, 


where the products in the first two lines are over all values of the 7’ + 1 indices. 
Thus, the set of numbers n;;(t) form a set of sufficient statistics, as announced. 
The actual distribution of the n;;(t) is (2.3) multiplied by an appropriate 


function of factorials. Let nt — 1) = Ze n;;(t). Then the conditional 
distribution of n;;(t),7 = 1, --- , m, given n,(t — 1) (or given n(s),k = 1,---, 
m,3s = 0, “+s ¢t — 1) is 

n(t — 1)! , 
24) mt — OTT cone, 


I] nj: °™ 
j=l 


This is the same distribution as one would obtain if one had n,(t — 1) observa- 
tions on a multinomial distribution with probabilities p;;(t) and with resulting 
numbers n;;(t). The distribution of the n,;(¢) (conditional on the n-(0)) is 

mri a(t — lis ; 

i = I py O"* 


(2.5) II 
_ * Il ni;(t) ! - 4: 


- 





92 T. W. ANDERSON AND LEO A. GOODMAN 


For a Markov chain with stationary transition probabilities, a stronger result 
a er 

concerning sufficiency follows from (2.3); namely, the set n;; = ae n;,;(t) 
form a set of sufficient statistics. This follows from the fact that, when the 
transition probabilities are stationary, the probability (2.3) can be written 
in the form 

m 

\ i Ng ;(t) ni 

(2.6) IT II poy? = II pi. 

t=1 9g.) tJ 
For not necessarily stationary transition probabilities p,;;(t), the n;,;(t) are a 
minimal set of sufficient statistics. 


2.2. Maximum likelihood estimates. The stationary transition probabilities 
pi; can be estimated by maximizing the probability (2.6) with respect to the 
pi; , subject of course to the restrictions p;; 2 0 and 


m 
(2.7) > ps = 1, 
j=l 

when the n;; are the actual observations. This probability is precisely of the 
same form, except for a factor that does not depend on p;;, as that obtained 
for m independent samples, where the ith sample (¢ = 1, 2, --- , m) consists of 
nt = > ni; multinomial trials with probabilities p;; (7,7 = 1, 2, ---, m). For 
such samples, it is well-known and easily verified that the maximum likelihood 
estimates for p;; are 


m 


T T 
ps = ny/nt = Do ny()/ dD Dd nal 


t=1 kel t=l 


(2.8) 


T T—1 
= > ni; (t) / 2 n;(t), 
t=1 t=) 

and hence this is also true for any other distribution in which the elementary 
probability is of the same form except for parameter-free factors, and the re- 
strictions on the p;; are the same. In particular, it applies to the estimation of 
the parameters p;; in (2.6). 

When the transition probabilities are not necessarily stationary, the general 
approach used in the preceding paragraph can still be applied, and the maximum 
likelihood estimates for the p;;(t) are found to be 


(2.9) pis (t) => nij(t)/nlt = 1) = ni;(t) > nix(t). 
k=l 
The same maximum likelihood estimates for the p;;(t) are obtained when we 
consider the conditional distribution of n;,;({t) given n;(t — 1) as when the joint 
distribution of the n;;(1), n;;(2), --- , ni;(T) is used. Formally these estimates 
are the same as one would obtain if for each 7 and ¢ one had n,(t — 1) observa- 


tions on a multinomial distribution with probabilities p;;(t) and with resulting 
numbers 7; ;(t). 





MARKOV CHAINS 93 


The estimates can be described in the following way: Let the entries n;;(t) 
for given t be entered in a two-way m X m table. The estimate of p,;(t) is the 7, 
jth entry in the table divided by the sum of the entries in the ith row. In order 
to estimate p,; for a stationary chain, add the corresponding entries in the two- 
way tables for ¢ = 1,---, 7, obtaining a two-way table with entries n;; = 
> .n;;(t). The estimate of p;; is the i, jth entry of the table of n;,’s divided by 
the sum of the entries in the ith row. 

The covariance structure of the maximum likelihood estimates presented in 
this section will be given further on. 


2.3. Asymptotic behavior of n;;(¢). To find the asymptotic behavior of the 
pi;, first consider the n,,;(t). We shall assume that n,(0) > n(0) > m 
(m > 0, Z. m = 1) as > n;(0) — o. For each 7(0), the set njoiay..-scr) are 
simply multinomial variables with sample size no (0) and parameters 
Pi@)ia) Piayi@) *** Picr—yacry , and hence are asymptotically normally distributed 
as the sample size increases. The n,;(t) are linear combinations of these multi- 
nomial variables, and hence are also asymptotically normally distril uted. 

Let P = (p;;) and let pi‘! be the elements of the matrix P‘. Then p};) is the 
probability of state 7 at time ¢ given state 7 at time 0. Let n;,.;;(¢) be the number 
of sequences including state k at time 0, 7 at time ¢ -- 1 andj at time ¢. Then 
we seek the low order moments of 


(2.10) nib = ) My. «(t). 
kat 


The probability associated with ny.,;(t) is psi! pi; with a sample size of n,(0). 
Thus 


(2.11) Eng: <;(t) = n(0)pis pi; : 


(2.12) Var} nz:.,(t)} = n,(0)pe psf _ pi" pI, 

(2.13)  Cov{ng;ss(t), reon(t)} = —ne(O)pei pispig Dor, (i,j) ¥ (g, h), 
since the set of n,,,;(¢) follows a multinomial distribution. Covariances between 
other variables were given in [1]. 

Let us now examine moments of nx, ;;(t) — me;s(t — 1)pi; , where nz, (t — 1) = 
>; m,:;(t); they will be needed in obtaining the asymptotic theory for test 
procedures. The conditional distribution of n,;;;(t) given m;;(t — 1) is easily 
seen to be multinomial, with the probabilities p;;. Thus, 


(2.14) &{ ng. i;(t) | ny. s(t > 1)} = Pi ne. <(t —_ 1), 
&{ ne. aj(t) — ne s(t — 1)pi;} 


(2.15) 
= §8{[ng.c;(0) — me-i(t — 1)pi;] | met — 1)} = 0. 





94 T. W. ANDERSON AND LEO A. GOODMAN 


The variance of this quantity is 
Elne.is(t) — nest — 1) past 
66 [nz.:;(t) — 
Eng: s(t — 1) pil — pi5) 
n(O) pe pii(1 — pi 
The covariances of pairs of such quantities are 


} 
| 


&In ss cll — l)p 


To summarize, the random variables n,.,;(t) — ny-i(t — 1)p;; for. l, 


m have means O and variances and covariances of multinomial variables 


“a*,* . i 1! rin . 
probabilities p;; and sample size n,(0)pi;_ °. The variables n,.;;(t) — n, 
ort * g. 


(s — 1)p,, are uncorrelated if t + 


{) are independent if 


and nx:gr($) — Neg 
fixed, Ny; i5(t) and Ni: gall 


Since we assume 7,(0 


Thus 


2.20) 0. 


2.21) 





MARKOV CHAINS 95 


2.4. The asymptotic distribution of the estimates. It will now be shown that 


when fi > Dr, 


T 
D ni(t) 


sa = 
V N\DPii — Pp = VR 





= Vn 


has a limiting normal distribution, and the means, variances and covariances 
of the limiting distribution will be found. Because n,.,;(t) is a multinomial 
variable, we know that 


(2.25 Ny: «j(t)/n eZ [nx <;(t), n(0) | 
converges in probability to its expected value when n,(0)/n 
T 


- asi etl 
p lim - > nit ~ 1) = lim-&s > nit — 
t=1 


non Tl t=l no Ti 
m T 
{t—1 
= 2 le Pi 
k=l t=1 


Therefore n’* (7 ) has the same limit distribution as 
T 


( 1/2 
z: [nj(Q) — pyndt — 1)]/n 


t =} 
— T 
mp 
Nk Pej 


k=l t=l 


(see p. 254 in [6)). 


From the conclusions in Section 2.3, the numerator of (2.27) has mean 0 and 


r 2 m ? 
| > nis(t) — pynilt — » | n= ie p m(O)pexs pay(1 — ps) /n. 


kel t=] 


variance between two different numerators is 


T > / 
g [> nij(t) — pynidt — p> Non(t) — Porn, (t — »|/ n 
t=] t=1 


m T 
s {t—1] 
= big Dy De m(0) per pis Pon/n, 


k=l t=l 


(2.29 


where 6;, = Oif 7 + g and 6;; = 1. 





96 T. W. ANDERSON AND LEO A. GOODMAN 


Let 


m T 

(2.30) a 2, mpi = ¢:. 

Then the limiting variance of the numerator of (2.27) is ¢; pi;(1 — pj;), and 
the limiting covariance between two different numerators is —6;, $; Pi; Por. 
Because the numerators of (2.27) are linear combinations of normalized multi- 
nomial variables, with fixed probabilities and increasing sample size, they have 
a limiting normal distribution and the variances and covariances of this limit 
distribution are the limits of the respective variances and covariances (see, e.g., 
Theorem 2, p. 5 in [4}). 

Since n"? (p;; — pi;) has the same limit distribution as (2.27), the variables 
n'” (p:; — pi) have a limiting joint normal distribution with means 0, variances 
pis(1 — pi;)/o: and the covariances —6¥, pi ;por/¢; . The variables (n,)'""(p;; — pis) 
have a limiting joint normal distribution with means 0, variances p;;(1 — p,,) 
and covariances —4,,p;;Pgx. Also, the set (n?)'? (p;; — p,;) has a limiting 
joint normal distribution with means 0, variances p;;(1 — p;;) and covariances 
—5igPisDon , Where nt = > 70 ni(t). 

In other terms, the set (ng,)"” (pi; — pis) for a given 7 has the same limiting 
distribution as the estimates of multinomial probabilities p;; with sample size 
n@; , Which is the expected total number of observations nf in the ith state for 
t = 0,---, T — 1. The variables (np;)"” (pi; — pis) for m different values of 7 
(¢ = 1, 2,---, m) are asymptotically independent (i.e., the limiting joint 
distribution factors), and hence have the same limiting joint distribution as 
obtained from similar functions of the estimates of multinomial probabilities 
pi; from m independent samples with sample sizes ng; (¢ = 1, 2,--- , m). It 
will often be possible to reformulate hypotheses about the p;; in terms of m 
independent samples consisting of multinomial trials. 

We shall also make use of the fact that the variables p,,;(t) = n.;(t)/ni(t — 1) 
for a given 7 and ¢ have the same asymptotic distribution as the estimates of 
multinomial probabilities with sample sizes 6n;(t — 1), and the variables p;;(t) 
for two different values of 7 or two different values of ¢ are asymptotically inde- 
pendent. This fact can be proved by methods similar to those used earlier in 
this section. Hence, in testing hypotheses concerning the p;;(t) it will sometimes 
be possible to reformulate the hypotheses in terms of m X T independent 
samples consisting of multinomial trials, and standard test procedures may then 
be applied. 


3. Tests of hypotheses and confidence regions. 


3.1. Tests of hypotheses about specific probabilities and confidence regions. 
On the basis of the asymptotic distribution theory in the preceding section, we 
an derive certain methods of statistical inference. Here we shall assume that 
every pij > 0. 

First we consider testing the hypothesis that certain transition probabilities 





MARKOV CHAINS 97 


pi; have specified values p};. We make use of the fact that under the null hy- 
pothesis the (n?)'* (p;; — p{;) have a limiting normal distribution with means 
zero, and variances and covariances depending on p{; in the same way as ob- 
tains for multinomial estimates. We can use standard asymptotic theory for 
multinomial or normal distributions to test a hypothesis about one or more 
pi; , or determine a confidence region for one or more 7; . 
As a specific example consider testing the hypothesis that pi; = pi;,j = 
-+ , m, for a given 7. Under the null hypothesis, 


m 


(pis — pis)” 
(3.1) > af SA ee 
j=l Pi 
has an asymptotic x’-distribution with m — 1 degrees of freedom (according to 
the usual asymptotic theory of multinomial variables). Thus the critical region 
of one test of this hypothesis at significance level a consists of the set p,; for 
which (3.1) is greater than the a significance point of the x°-distribution with 
m — 1 degrees of freedom. A confidence region of confidence coefficient a con- 
sists of the set pi; for which (3.1) is less than the a significance point. (The p‘; 
in the denominator can be replaced by p;;.) Since the variables n7(p;; — p;;)° 
for different 7 are asymptotically independent, the forms (3.1) for different 7 are 
asymptotically independent, and hence can be added to obtain other x*-variables. 
For instance a test for all p;; (i, 7 = 1, 2,--- , m) can be obtained by adding 
(3.1) over all 7, resulting in a x’-variable with m(m — 1) degrees of freedom. 
The use of the x°-test of goodness of fit is discussed in [5]. We believe that 


there is as good reason for adopting the tests, which are analogous to x‘-tests 
of goodness of fit, described in this section as in the situation from which they 
were borrowed (see [5]). 


3.2. Testing the hypothesis that the transition probabilities are constant. 
In the stationary Markov chain, p;; is the probability that an individual in 
state 7 at time t — 1 moves to state j at ¢. A general alternative to this assump- 
tion is that the transition probability depends on ¢; let us say it is p;;(t). We test 
the null hypothesis H:p;;(t) = pi; (¢ = 1,---, T). Under the alternate hy- 
pothesis, the estimates of the transition probabilities for time ¢ are 


ni;(t) 


3.2 baht) ee eee eee 
Pit) n(t — 1) 


The likelihood function maximized under the null hypothesis is 
r ‘ 

< oo) II Il pi 
tel i,j 


The likelihood function maximized under the alternative is 


(3.4) Il II p(t)". 


t~ ag 





98 T. W. ANDERSON AND LEO A. GOODMAN 
The ratio is the likelihood ratio criterion 


m nij(t) 
(35) = TO P| 

t tJ Di;(t) 
A slight extension of a theorem of Cramér [6] or of Neyman [11] shows that 
—2 log \ is distributed as x° with (7 — 1) [m(m — 1)] degrees of freedom when 
the null hypothesis is true. 

The likelihood ratio (3.5) resembles likelihood ratios obtained for standard 
tests of homogeneity in contingency tables (see [6], p. 445). We shall now de- 
velop further this similarity to usual procedures for contingency tables. A proof 
that the results obtained by this contingency table approach are asymptotically 
equivalent to those presented earlier in this section will be given in Section 6. 

For a given 7, the set p;;(t) has the same asymptotic distribution as the esti- 
mates of multinomial probabilities p;,;(t) for T independent samples. An m XK T 
table, which has the same formal appearance as a contingency table, can be 
used to represent the joint estimates p;;(t) for a given i and for j 12,-°:,m™ 
and ¢t = 1,2,---, T. 


2 m 


&" . 
1 | pall) pill) --+ pim(1) 


2 Dis(2) Di2(2) — Dim(2) 
DalT) pilT) --+ pim(T) 


The hypothesis of interest is that the random variables represented by the T 
rows have the same distribution, so that the data are homogeneous in this 
respect. This is equivalent to the hypothesis that there are m constants pj , 
P2,***, Pim, With >>; pi; = 1, such that the probability associated with the 
jth column is equal to p,;; in all T rows; that is, p(t) = pi; fort = 1,2,---, T. 
The x’-test of homogeneity seems appropriate here ((6], p. 445); that is, in order 
to test this hypothesis, we calculate 
(3.6) x = Dnktt - I [piult) — ps)’ / pa ; 
if the null hypothesis is true, xj has the usual limiting distribution with (m — 1 
(T — 1) degrees of freedom. 

Another test of the hypothesis of homogeneity for 7 independent samples 
from multinomial trials can be obtained by use of the likelihood ratio criterion; 
that is, in order to test this hypothesis for the data given in the m X T table, 
calculate 


(3.7) AG = II Ds | pis(t))"" 


which is formally similar to the likelihood ratio criterion. The asymptotic 
distribution of —2 log \; is x” with (m — 1)(T — 1) degrees of freedom. 





MARKOV CHAINS 99 


The preceding remarks relating to the contingency table approach dealt 
with a given value of 7. Hence, the hypothesis can be tested separately for each 
value of 7. 

Let us now consider the — oe that p(t) = pi;foralld = 1,2,---, 
m,j = 1,2,-- ,t=1,---, T. A test of this joint null hypothesis follows 
directly from the ‘ais that the sadeile variables p;;(t) and p;; for two different 

values of i are asymptotically independent. Hence, under the null hypothesis, 
the set of x? calculated for each i = 1, 2, --- , m are asymptotically independent, 
and the sum 


(3.8) x’ = a = LD nt — Vipul) — pul? / pis 
t= . t.J 
has the usual limiting distribution with m(m — 1)(T — 1) degrees of freedom. 
Similarly, the test criterion based on (3.5) can be written 
2. —2 log \; = —2 log X. 
i=l 

3.3. Test of the hypothesis that the chain is of a given order. Consider first a 
second-order Markov chain. Given that an individual is in state 7 at t — 2 and 
inj att — 1, let pign(t) (¢,7,k = 1,--- ,m;t = 2,3, --- , T) be the probability 
of being in state k at t. When the second-order chain is stationary, p;,(t) = 
pix for t = 2,---, 7. A first-order stationary chain is a special second-order 
chain, one for which p;;(t) does not depend on 7. On the other hand, as is well- 
known, the second-order chain can be represented as a more complicated first- 
order chain (see, e.g. [2]). To do this, let the pair of successive states i and j 
define a composite state (7, 7). Then the probability of the composite state 
(j, k) at ¢ given the composite state (7,7) at ¢ — 1 is pij(t). Of course, the prob- 
ability of state (h, k), h ¥ 7, given (1, 7), is zero. The composite states are easily 
seen to form a chain with m’ states and with certain transition probabilities 0. 
This representation is useful because some of the results for first-order Markov 
chains can be carried over from Section 2. 

Now let nji,(t) be the number of individuals in state i at t — 2, inj att — 1, 
and in k at ¢, and let n;;(t — 1) = > n;(t). We assume in this section that 
the n,(0) and n;;(1) are nonrandom, extending the idea of the earlier sections 
where the n,(0) were nonrandom and the n;;(1) were random variables. The 
nij(t) (4,9, k = 1,---,m;t = 2,---, T) is a set of sufficient statistics for 
the different sequences of states. The conditional distribution of n;;(t), given 
nij(t — 1) 


nij(t — 1)! 


ng ikl) 
II cal Tl _ 


(3.10) 


(When the transition probabilities need not be the same for each time interval, 
the symbols p;;, should, of course, be replaced by the appropriate p;;.(t) through- 





100 T. W. ANDERSON AND LEO A. GOODMAN 


out). The joint distribution of n;,(t) for 7,7, k = 1,---,mandt = 2,---, T, 
when the set of n;;(1) is given, is the product of (3.10) over 7, 7 and ¢. 

For chains with stationary transition probabilities, a stronger result concern- 
ing sufficiency can be obtained as it was for first-order chains; namely, the 
numbers n;j. = > ue n:x(t) form a set of sufficient statistics. The maximum 
likelihood estimate of p,,, for stationary chains is 


m ? T 
(3.11) Din = Nin / Do nin = DW ninlt) / dD nyt — 1). 
l=1 t=2 t=2 


Now let us consider testing the null hypothesis that the chain is first-order 
against the alternative that it is second-order. The null hypothesis is that 
Dijk = Dojk = +++ = Pmik = Pye, Say, forj, k = 1,---, m. The likelihood 
ratio criterion for testing this hypothesis is” 


m 


(3.12) = TI Ge/ da), 


i,j k=l 
where 


m m 


™ 7 T-—1 
(3.13) Pick = 2 Mijh 2d 2 nit = d njx(t) 2» n;(t) 
is the maximum likelihood estimate of pj; . We see here that p, differs some- 
what from (2.8). This difference is due to the fact that in the earlier section the 
n;;(1) were random variables while in this section we assumed that the n;;(1 
were nonrandom. Under the null hypothesis, —2 log \ has an asymptotic x’- 
distribution with m’(m — 1) — m(m — 1) = m(m — 1)’ degrees of freedom. 

We observe that the likelihood ratio (3.12) resembles likelihood ratios ob- 
tained for problems relating to contingency tables. We shall now develop further 
this similarity to standard procedures for contingency tables. 

For a given j, the n’” (pije — pije) have the same asymptotic distribution as 
the estimates of multinomial probabilities for m independent samples (i = 1, 
2,---,m). An m X m table, which has the same formal appearance as a 
contingency table, can be used to represent the estimates p;,, for a given 7 
and for 1, k = 1, 2,--- , m. The null hypothesis is that pix = pj for i = 1, 
2,---, m, and the x-test of homogeneity seems appropriate. To test this hy- 
pothesis, calculate 


(3.14) xj = » nii(Piik — Din) /Pix , 
where 


T T T—1 
(3.15) ny = > nin = Dd nialt) = Dnt — 1) = D n(0). 
k k tm? 


t=2 t==l1 
If the hypothesis is true, xj has the usual limiting distribution with (m — 1)° 
degrees of freedom. 


* The criterion (3.12) was written incorrectly in (6.35) of [1] and (4.10) of [2]. 





MARKOV CHAINS 101 


In continued analogy with Section 3.2, another test of the hypothesis of 
homogeneity for m independent samples from multinomial trials can be ob- 
tained by use of the likelihood ratio criterion. We calculate 


(3.16) 4 = I] Ga / din)”, 
ik 


which is formally similar to the likelihood ratio criterion. The asymptotic 
distribution of —2 log \; is x’ with (m — 1)° degrees of freedom. 

lhe preceding remarks relating to the contingency table approach dealt with 
a given value of 7. Hence, the hypothesis can be tested separately for each 
value of j. 

Let us now consider the joint hypothesis that pi = pj for all i, j,k = 1, 
2,---,m. A test of this joint hypothesis can be obtained by computing the sum 


2 = 2 * « Wye 
x * 2 as * du i j( Dis — Pix) / Dix, 
j= 7,2, 


which has the usual limiting distribution with m(m — 1)’ degrees of freedom. 
Similarly the test criterion based on (3.12) can be written 


>, —2 log A; = —2 log dx = 2 >> nix log [pix / pil 
(3.18) sn - 
= 2 » Nizx (log Pix — log pj. 
%) 

The preceding remarks can be directly generalized for a chain of order r. 
Let pij...21 (t,j, °° , &, lb = 1, 2,--+ , m) denote the transition probability of 
state | at time ¢t, given state k at time ¢ — 1 --- and statej at time ¢ — r + 1 
and state i at time t — r (t¢ = r,r + 1,--- , T). We shall test the null hypothesis 
that the process is a chain of order r — 1 (that is, pj;..41 = pj...42: for i = 1, 
2,---,m) against the alternate hypothesis that it is not an r — 1 but an r-order 
chain. 

Let n,;...2:(t) denote the observed frequency of the states i, 7,---, k,l at 
the respective times ¢ — r,t — r+ 1,---,t — 1, ¢, and let nj;..4(¢ — 1) = 
Lr ni;...2(t). We assume here that the nj;..4(r — 1) are nonrandom. The 
maximum likelihood estimate of p;;....: is 


s : ‘ ,_% 
(3.19) Dij.--et = Mej...nt/Mgz..-k 


T : 
where nj;...01 = Dorer ij---40(t) and 


T—1 


° 
(3.20) Rejoo-t = be Nj... = os Njj..“&(t — 1) = > N;...n(t). 


i—r t=r—1 


For a given set j, --- , k, the set p,;...4. will have the same asymptotic distribu- 
tion as estimates of multinomial probabilities for m independent samples (7 = 


2,---, m), and may be represented by an m X m table. If the null hypothesis 





102 T. W. ANDERSON AND LEO A. GOODMAN 


(pij...28 = Dj...21 for 7 2,--+, m) is true, then the x°-test of homogeneity 
seems appropriate, and 


(3.21) Xj. = 7. nis..-e(Di t— pj...) 


where 


T 7 i 
(BBB) Bynncas = Dd, Wiener / 2, Mise = 2, Ryall / > n,;..x(), 
i t=r / t=r—l 


i 


has the usual limiting distribution with (m — 1) degrees of freedom. We see 
here that pj;....: differs somewhat from the maximum likelihood estimate for 
p;...«. for an (r — 1)-order chain (viz., es 1 7;...4r(t) co on (t)). This 
difference is due to the fact that the n;....:(r — 1), for an (r 1)-order chain, 
are assumed to be multinomial random variables with parameters p while 
in this paragraph we have assumed that the n;....(r — 1) are fixed 
Since there are m”" sets j, ---, k (j = 1,2,---,m:--- 5h l, 2, 
the sum a x; ... Will have the usual limiting distribution with m™ (mm 
degrees of freedom under the joint null hypothesis (p;;.. p;..-«t for 
1, 2, -+- , m and all values from 1 to m of j, --- , k) is true 
Another test of the null hypothesis can be obtained by use of the likelihood 
ratio criterion 


(3.23) jek = LD Bj---40/pij--.d) 
i,l 


where —2 log Xj... is distributed asymptotically 
of freedom. Also, 


(3.24) > {—2 log \j...«} 

goood 
has a limiting x’ -distribution with m”™*(m — 1)* degrees of freedom when the 
joint null hypothesis is true (see [10]). 

In the special case where r = 1, the test is of the null hypothesis that ob- 
servations at successive time points are statistically independent against the 
alternate hypothesis that observations are from a first-order chain. 

The reader will note that the method used to test the null hypothesis that 
the process is a chain of order r — 1 against the alternate hypothesis that it 
is of order r can be generalized to test the null hypothesis that the process is of 
order u against the alternate hypothesis that it is of order r (wu < r). By an ap- 
proach similar to that presented earlier in this section, we can compute the 
x’-criterion or —2 times the logarithm of the likelihood ratio and observe that 
these statistic are distributed asymptotically as x with [m’ m*\(m 1) de- 
grees of freedom when the null hypothesis is true. 

In this section, we have assumed that the transition probabilities are the 
same for each time interval, that is, stationary. It is possible to test the null 
hypothesis that the rth order chain has stationary transition probabilities 








MARKOV CHAINS 103 


using methods that are straightforward generalizations of the tests presented 
in the previous section for the special case of a first-order chain. 


3.4. Test of the hypothesis that several samples are from the same Markov 
chain of a given order. The general approach presented in the previous sections 
can be used to test the null hypothesis that s (s 2 2) samples are from the same 
rth order Markov chain; that is, that the s processes are identical. 

Let af. uw = ns. . kl Rigen denote the maximum likelihood estimate of the 
rth order transition probability p{)..: for the process from which sample h 
(h = 1,2, --- , 8) was obtained. We wish to test the null hypothesis that p{} 22 = 
pij..x. for h = 1, 2,--- , s. Using the approach presented herein, it follows that 
(3.25 xij t= Pas nis. (p? a Di3. ..nt) /Pipe. kl, 
where n;32..41 = 2a ni} ..ep and ps3. h = nf) ..o1/D oes R55). te > has the usual 
limiting distribution with (s — 1)(m — 1) degrees of freedom. Also, ) ip 
x; ;.... has a limiting x*-distribution with m’(s — 1)(m — 1) degrees of freedom. 

When s 2, xij... can be rewritten in the form 

3.26) Xi i= 2 Cy..a (ps3 . at _ Di. xt) / PSP. kl 5 

where p;}:...: is the estimate of p;;..... obtained by poolin~ the data in the two 
samples, and Cy; ..% = (1/nt{?.4) + (1/n3.,). Also, D0 i.j.---sk Xaj-.-e has the 
usual limiting distribution with m’(m — 1) degrees of freedom in the two sample 
case. 

Analogous results can also be obtained using the likelihood-ratio criterion. 


3.5. A test involving two sets of states. In the case of panel studies, a person 
is usually asked several questions. We might classify each individual according 
to his opinion on two different questions. In an example in [2], one classification 
indicated whether a person saw the advertisement of a certain product and the 
other whether he bought the product in a certain time interval. Let the state 
be denoted (a, 8), a = 1,---, A and 8 = 1,--- , B where a denotes the first 
opinion or class and 8 the second. We assume that the sequence of states satisfies 
a first-order Markov chain with transition probabilities pas,,,. We ask whether 
the sequence of changes in one classification is independent of that in the second. 
For example, if a person notices an advertisement, is he more likely to buy the 
product? The null hypothesis of independence of changes is 


(3.27) Po8u» = Yapls, 2, uw = 1,--- 


A koe 1, = -, 8, 

where qa, is a transition probability for the first classification and rg, is for the 

second. We shall find the likelihood ratio criterion for testing this null hypothesis. 
Let %as.u»(t) be the number of individuals in state (a, 8) at t — 1 and (yu, v) 


at ¢. From the previous results, the maximum likelihood estimate of pPas.ys , 
when the null hypothesis is not assumed, is 





— 2 NaB.ur 
(3.28) PoB ur = ———e 


a 
Ee > NaB sh 


s=l h=l 








104 T. W. ANDERSON AND LEO A. GOODMAN 


7 
where Nos,» = air, 1 Nes uv(t). When the null hypothesis is assumed, the max- 
imum likelihood estimate of pag» IS au Ta , Where 


B 
ye TlaB pv 


a B,v=1 
(3.29) 


Bb A 
7 a Nab» 


B,v=1 


4 


2. NaB ur 


=1 


(3.30) —— 


A K re 
2 7 Nag - 


ap=l aml 


The likelihood ratio criterion is 


(3.31) TI Il I (=* )~ 


t=1 ay=1 Br Pa8.ur 


Under the null hypothesis, —2 log \ has an asymptotic x°-distribution, and 
the number of degrees of freedom is AB(AB — 1) — A(A — } B(B — 1) = 
(A — 1)(B — 1)((AB+ A + B). 


4. A modified model. In the preceding sections, we assumed that the n,(0) 
were nonrandom. An alternative is that the n,(0) are distributed multinomially 
with probability 7; and sample size n. Then the distribution of the set n,,(t) 
is (2.5) multiplied by the marginal distribution of the set ,(0) which is 


(4.1) 


In this model, the maximum likelihood estimate of p;; is again (2.8), and the 
maximum likelihood estimate of 7; is 


nO) 


nm 


(4.2) 


The means, variances, and covariances of n;;(t) — n:(t — 1)p;; are found by 
taking the expected values of (2.20) to (2.23); the same formulas apply with 
n,(0) replaced by nn, . Also nj;(t) — nt — 1)pi; are uncorrelated with n,(0). 
Since n,(0)/n estimates 7, consistently, the asymptotic variances and covariances 
of n'? (pi; — pi) are as in Section 2.4. It follows from these facts that the 
asymptotic theory of the tests given in Section 3 hold for this modified model. 

The asymptotic variances and covariances simplify somewhat if the chain 
starts from a stationary state; that is, if 


(4.3) > % Pei = Ni - 
kewl 





MARKOV CHAINS 105 


For then 2. m Pe = m and ¢; = Tn; . If it is known that the chain starts 
from a stationary state, equations (4.3) should be of some additional use in the 
estimation of p,; when knowledge of the 7;, or even estimates of the n;, are 
available. We have dealt in this paper with the more general case where it 1s 
not known whether (4.3) holds, and have used the maximum likelihood esti- 
mates for this case. The estimates obtained for the more general case are not 
efficient in the special case of a chain in a stationary state because relevant 
information is ignored. In the special case, the maximum likelihood estimates 
for the ; and p;; are obtained by maximizing log L = >on; log pi; + >on,(0) 
log n; subject to the restrictions > pi =1,>, NPis = 55 > n,=1, p52 
0, 7; 2 0. In the case of a chain in a stationary state where the »; are known, 
the maximum likelihood estimates for the p;; are obtained by maximizing 
> my log p;; subject to the restrictions > pis = |, rs MPij = 05, Dis 2 O. 
Lagrange multipliers can be used to obtain the equations for the maximum 
hood estimates. 


5. One observation on a chain of great length. In the previous sections, 
asymptotic results were presented for n,(0) — «, and hence ae n(0) = 
n—» ©, while T was fixed. The case of one observed sequence of states (n = 1) 
has been studied by Bartlett [3] and Hoel [10], and they consider the asymptotic 
theory when the number of times of observation increases (T — ~). Bartlett 
has shown that the number n;; of times that the observed sequence was in 
state 7 at time ¢ — 1 and in state j at time ¢, for ¢ = 1, --- , T, is asymptotically 
normally distributed in the ‘positively regular’ situation (see [3], p. 91). He also 
has shown ((3], p. 93) that the maximum likelihood estimates p;; = n,;/nt 
(n*¥ = 50%, n;;) have asymptotic variances and covariances given by the usual 
multinomial formulas appropriate to & n} independent observations (¢ = 1, 
2, ---, m) from multinomial probabilities p;; (j = 1, 2,--- , m), and that the 
asymptotic covariances for two different values of i are 0. An argument like 
that of Section 2.4 shows that the variables (nf)'” (p;; — pi;) have a limiting 
normal distribution with means 0 and the variances and covariances given in 
Section 2.4. This result was proved in a different way by L. A. Gardner [8]. 

Thus we see that the asymptotic theory for 7 — » and n = 1 is essentially 
the same as for 7' fixed and n,(0) — «. Hence, the same test procedures are 
valid except for such tests as on possibly nonstationary chains. For example, 
Hoel’s likelihood ratio criterion [10] to test the null hypothesis that the order 
of the chain is r — 1 against the alternate hypothesis that it is r is parallel to 
the likelihood ratio criterion for this test given in Section 3.3. The x’-test for 
this hypothesis, and the generalizations of the tests to the case where the null 
hypothesis is that the process is of order u and the alternate hypothesis is that 
the process is of order r(u < r), which are presented in Section 3.3, are also 
applicable for large 7’. Also, the x’-test presented in Section 3.1 can be generalized 
to provide an alternative to Bartlett’s likelihood ratio criterion [3] for testing 
the null hypothesis that p;;...41 = ptj...01 (specified). 





106 T. W. ANDERSON AND LEO A. GOODMAN 


6. x’-tests and likelihood ratio criteria. The x?-tests presented in this paper 
are asymptotically equivalent, in a certain sense, to the corresponding likelihood 
ratio tests, as will be proved in this section. This fact does not seem to follow 
from the general theory of x’-tests; the x’-tests presented herein are different 
from those x’-tests that can be obtained directly by considering the number of 
individuals in each of the m” possible mutually exclusive sequences (see Section 
2.1) as the multinomial variables of interest. The x’-tests based on m’” categories 
need not consider the data as having been obtained from a Markov chain and 
the alternate hypothesis may be extremely general, while the x’-tests presented 
herein are based on a Markov chain model. 

For small samples, not enough data has been accumulated to decide which 
tests are to be preferred (see comments in [5]). The relative rate of approach 
to the asymptotic distributions and the relative power of the tests for small 
samples is not known. In this section, a method somewhat related to the rela- 
tive power will be tentatively suggested for deciding which tests are to be pre- 
ferred when the sample size is moderately large and there is a specific alternate 
hypothesis. An advantage of the x’-tests, which are of the form used in con- 
tingency tables, is that, for many users of these methods, their motivation 
and their application seem to be simpler. 

We shall now prove that the likelihood ratio and the x’-tests (tests of ho- 
mogeneity) presented in Section 3.2 are asymptotically equivalent in a certain 
sense. First, we shall show that the x’-statistic has an asymptotic x’-distribution 
under the null hypothesis. The method of proof can be used whenever the 
relevant p’s have the appropriate limiting normal distribution. In particular, 
this will be true for statistics of the form xj (see (3.6)). In order to prove that 
statistics of the form \; (see (3.7)), which are formally similar to the likelihood 
ratio criterion but are not actually likelihood ratios, have the appropriate 
asymptotic distribution, we shall then show that —2 log A; is asymptotically 
equivalent to the xj-statistic, and therefore it has an asymptotic x’-distribution 
under the null hypothesis. Then we shall discuss the question of the equiva- 
lence of the tests under the alternate hypothesis. The method of proof presented 
here can be applied to the appropriate statistics given in the other sections 
herein, and also where 7'— ~ as well as where n — ~. 

Let us consider the distribution of the x’-statistic (3.8) under the null hy- 
pothesis. From Section 2.4, we see that n'” (p,;(t) — p;;) are asymptotically 
normally distributed with means 0 and variances p;;(1 — pi;)/m(t — 1), ete., 
where m,(t) = &n,(t)/n. For different t or different 7, they are asymptotically 
independent. Then the [nm,(t — 1)]'” [p.;(t) — p,;] have asymptotically vari- 
ances pi;(1 — pj;), ete. Let pt; = Do, mt — 1) pij(t)/2 mt — 1). Then 
by the usual x’-theory, Donm; (t — 1)[pi,(t) — pi;\/pt; has an asymptotic 
x’-distribution under the null hypothesis. But 


(6.1) p lim (pf; — pis) = 0 





MARKOV CHAINS 


because 
(6.2) p lim [= — mid) 0. 
n 


From the convergence in probability of (pi; — pij;) and (m,(t) — n,(t)/n), 
and the fact that n'” (p;;(t) — p;;) has a limiting distribution, it follows that 


zs i mit — 1)(p.t) — p%)’ nit — 1)(p,(t) — p,)* 

(6.3) p lm [nx ae —_ pelle se >. -—- -— Pal’! — Per = 0. 
Pij Pij 

Hence, the x’-statistic has the same asymptotic distribution as donmi(t — | 
(pi(t) — pty'/pt;; that is, a x’-distribution. This proof also indicates that the 
Pp bul /i i P 
x:-Statistics (3.6) also have a limiting x°-distribution. We shall now show that 
—2 log \, (see (3.7)) is asymptotically equivalent to x; under the null hypothesis; 
and hence will also have a limiting x’-distribution. 

We first note that for \z| < 4 

(1+ 2) log (14+ 2) = 14+ 2)(¢ —- 7/24 2°/3 — 2'/44+--- 
(6.4) ‘ 
=a2+a°/2 — (2*/6)(1 — 2/2 +'---), 

and 


(6.5) | (1 + x) log (1 + 2) — & — #°/2| = | (2°/6)(1 — 2/24+--- 
(see p. 217 in '6}). We see also that 


—2 log \; = —2 > ni(t) log [pi;/piuld) 
j.t 


2>5 nt — 1) pst) log [pi(t)/pss] 
j.t 


2 > nt — 1) pill + 24d] log (11 + 2,,(0)], 
J.t 


where 2;;(t) = [pi;(t) — pi;|/pi;. The difference A between —2 log \; and the 
; Dis\/ Dis 
xi-Statistic is 
A= —2log\; - x. 
= 2 > 51. nlt — lps fl + 2,(d] log (1 + 24,(0)] — (x; ,(t)}°/2}. 
Since }07; pijai,(t) = O, 


— 


(6.8) A= 2 >, nit — L)pif{l + 2;,(t)] log [1 + 2.,(t)]—2,,(t) — (x. ,;(t)}°/2}. 


3,t 


(6.7 


We shall show that A converges to 0 in probability; i.e. for any « > 0, the 
probability of the relation | A, < ¢, under the null hypothesis, tends to unity as 
n = >; nt) + ©. The probability satisfies the relation 


Pr{ | 4| < ef 2 Pr{|A| < e€and | 2;;(t) | < 3} 
(6.9) > Prf (2 Dose nt — 1)pidai,() | < € and | x;;(t) | < 4} 


> Prf2n do54 | 25;(t) |® < e and | x;(t) | < 4}. 





108 T. W. ANDERSON AND LEO A. GOODMAN 


It is therefore necessary only to prove that n[z,,(t)}’ converges to 0 in prob- 
ability. Since 2,;(t) = [pi;(t) — pij|/pi; converges to zero in probability under 
the null hypothesis, and 


_ fia —— tae 
(6.10) ~V2;;(t)n z(t) = V2;;(t)n | ES Ps] - |* 5 ps ; 
it follows that 


(6.11) nix.;(t))) = [(xi,;(t)n)” 2:,(0)P 


converges to zero in probability when the null hypothesis is true. Q.E.D. 

Since the x*-statistic has a limiting x’-distribution under the null hypothesis, 
and A = —2 log \; — x; converges in probability to zero, —2 log \; = xi tA 
has a limiting x*-distribution under the null hypethesis. 

The method presented herein for showing the asymptotic equivalence of —2 
log \; and x; could also be used to show the asymptotic equivalence of sta- 
tistics of the form —2 log \ and x’. It was proved in Section 3.2 that, under 
the null hypothesis, —2 log \ has a limiting x*-distribution with m(m — 1) 
(T — 1) degrees of freedom. (The proof in Section 3.2 applied to A, a likelihood 
ratio criterion, but would not apply to \; since they are not actually likelihood 
ratios.) Hence, we have another proof that the x’-statistic has the same limiting 
distribution as the likelihood ratio criterion under the null hypothesis. 

The previous remarks refer to the case where the null hypothesis is true. 
Now suppose the alternate hypothesis is true; that is, p;;(t) ~ p;;(s) for some 
t, s, i, j. It is easy to see that both the x’-test and the likelihood ratio test are 
consistent under any alternate hypothesis. In other words, if the values of p;;(t) 
for the alternate hypothesis and the significance level are kept fixed, then as n 
increases, the power of each test tends to 1 (see [5] and {11)). 

In order to examine the situation in which the power is not close to 1 in large 
samples and also to make comparisons between tests, the alternate hypothesis 
may be moved closer to the null hypothesis as n increases. If the values of 
pi;(t) for the alternate hypothesis are not fixed but move closer to the null 
hypothesis, it can be seen that the two tests are again asymptotically equiva- 
lent. This can be deduced by a slight modification of the proof of asymptotic 
equivalence under the null hypothesis given in this section (see also [5], p. 323). 

We shall now suggest another approach to the comparison of these tests when 
the alternate hypothesis is kept fixed. Since the null hypothesis is rejected 
when an appropriate statistic (x* or —2 log \) exceeds a specified critical value, 
we might decide that the x*-test is to be preferred to the likelihood ratio test 
if the statistic x° is in some sense (stochastically) larger than —2 log \ under 
the alternate hypothesis. 

Since n,(t) is a linear combination of multinomia! variables, we see that 
n;(t)/n converges in probability to its expected value &[{n,(t)/n] = m,(t). Hence, 
x'/n converges in probability to 
(6.12) D mit — 1)lpslt) — pil’/Du , 


1,J,t 





MARKOV CHAINS 


and (—2 log \)/n converges in probability to 


(6.13) 2 a mit — 1)pa(t) log [pis(t)/Pal, 


where 


(6.14) Dis = D pit) mAt — 1)/ D> mdt — 1) = plim py. 
t t noe 


The difference between (6.12) and (6.13) is approximately 
(6.15) Lim; (t — Ulpis(t) — Pisl’/ BP). 

Under the alternate hypothesis, these two stochastic limits differ from 0, 
and computation of them suggests which test is better. If (pi;(t) — pi;)/pi; is 
small, then there will be only a small difference between the two limits. When 
the alternative is some composite hypothesis, as is usually the case when x’- 
tests are applied, then these stochastic limits can be computed and compared 
for the simple alternatives that are included in the alternate hypothesis. 

This method for comparing tests is somewhat related to Cochran’s comment 
(see p. 323 in [5]) that either (a) the significance probability can be made to 
decrease as n increases, thus reducing the chance of an error of type I, or (b) 
the alternate hypothesis can be moved steadily closer to the nul hypothesis. 
Method (b) was discussed in [3]. If method (a) is used, then the critical value 
of the statistic (x* or — log \) will increase as n increases. When the critical 
value has the form en, where c is a constant (there may be some question as 
to whether this form for the critical value is really suitable), we see from the 
remarks in the preceding paragraph that the power of a test will tend to 1 if 
c is less than the stochastic limit and it will tend to 0 if c is greater than the 
stochastic limit. Hence, by this approach we find that the power of the x*-test 
can be quite different from the power of the likelihood ratio test, and some 
approximate computations can suggest which test is to be preferred. 

However, a more appealing approach is to vary the significance level so the 
ratio of significance level to the probability of some particular Type II error 
approaches a limit (or at least it seems that desirable sequences of significance 
points lie between c’ and cn). While the usual asymptotic theory does not give 
enough information to handle this problem, the comparison of stochastic limits 
may suggest a comparison of powers. 

The methods of comparison discussed herein can also be used in the study of 
the x’ and likelihood ratio methods for ordinary contingency tables. We have 
seen that, in a certain sense, the x? and likelihood ratio methods are not equiva- 
lent when the alternate hypothesis is true and fixed, and we have suggested a 
method for determining which test is to be preferred. 


REFERENCES 


{1) T. W. Anpgerson, ‘‘Probability models for analyzing time changes in attitudes,” 
RAND Research Memorandum No. 455, 1951. 





110 T. W. ANDERSON AND LEO A. GOODMAN 


’ 


|2] T. W. AnpgerRsoN, ‘‘Probability models for analyzing time changes in attitudes,’ 
Mathematical Thinking in the Social Sciences, edited by Paul F. Lazarsfeld, 
The Free Press, Glencoe, Illinois, 1954. 

[3] M. S. Bartuert, ‘‘The frequency goodness of fit test for probability chains,’’ Proc. 
Cambridge Philos. Soc., Vol. 47 (1951), pp. 86-95. 

[4] H. Cuernorr, ‘‘Large-sample theory: parametric case,’’ Ann. Math. Stat. Vol. 27 
(1956), pp. 1-22. 

[5] W. G. Cocuran, ‘‘The x?-test of goodness of fit,’’ Ann. Math. Stat., Vol. 23 (1952), 
pp. 315-345. 

[6] H. Crambr, Mathematical Methods of Statistics, Princeton University Press, Prince- 
ton, 1946. 

[7] W. Fe.uer, An Introduction to Probability Theory and Its Applications, Vol. 1, John 
Wiley and Sons, New York, 1950. 

[8] L. A. Garpner, Jr., ‘Some estimation and distribution problems in information 
theory,’’ Master’s Essay, Columbia University Library, 1954. 

{9] L. A. Goopman, ‘‘On the statistical analysis of Markov chains’’ (abstract), Ann. 
Math. Stat., Vol. 26 (1955), p. 771. 

[10] P. G. Hox, ‘‘A test for Markoff chains,’’ Biometrika, Vol. 41 (1954), pp. 430-433. 

[11] J. Neyman, ‘“‘Contribution to the theory of the x?-test,’’ Proceedings of the Berkeley 
Symposium on Mathematical Statistics and Probability, University of California 
Press, Berkeley, 1949, pp. 239-274. 





MULTISTAGE STATISTICAL DECISION PROCEDURES’ 


By M. A. Grrsuicx, 8. Karun, anp H. L. Roypsen 
Stanford University 


1. Introduction. A class of problems which arise in a variety of forms can be 
formulated as follows: We are requested to make periodic decisions of the same 
type but based on an increasing amount of information. Suppose we have a 
collection D, of decision procedures for the kth stage, given the amount of 
information available to us at that stage, and suppose that the procedures of D, 
are admissible under the assumption that the kth-stage decision is all that is 
required of us. Is it then true that when we have to prescribe decision pro- 
cedures d,, d2, --- , d, for each stage, we obtain an admissible class by taking 
an arbitrary procedure from each D, ? The answer turns out to be no in a large 
class of such decision problems which we consider here. This means that by plan- 
ning our whole sequence of decision procedures in advance we are able to do 
better on an average than if we were to make each decision as it arises. The 
present paper is devoted to the problem of prescribing rules which tell how the 
single-stage decision procedures d, should interlock with one another so as to 
give a minimal complete class of decision procedures for the multistage statistical 
decision problem. This problem is similar to the classical sequential decision 
problems which do not fix in advance the number of stages. In many respects 
the problem formulated here is simpler than the classical sequential decision 
problem, and sharper results are obtained—e.g., minimal complete classes of 
statistical decision procedures are determined. 

In order to illustrate the nature of this type of decision problem and its analysis, 
we might look at the following simple example: A biased coin is tossed, and the 
player is required to call heads or tails, being paid one unit for a correct call and 
nothing for an incorrect one. The first call of the player is made in complete 
ignorance, but for the nth play he has the evidence of the first (n — 1) tosses 
on which to make his call. To prescribe the classes D,, of strategies admissible 
for a single stage is to consider the problem of making the nth call on the basis 
of the first (n — 1) outcomes, the first (n — 1) calls having been forgotten. 
We then have the game in which nature chooses the bias p on the coin, the 
player observes a random variable z binomially distributed with parameter p 
and must choose between two actions with loss function —p and (p — 1). Here 
a decision procedure or strategy for the player consists of a function ¢(z) which 
gives the probability’ of calling heads if z is the number of heads that have 
previously been observed. Problems of this sort have been considered in [1] 


Received October 7, 1955; revised June 29, 1956. 

1 This work was sponsored by the Army, Navy, and Air Force through the Joint Services 
Advisory Committee for Research Groups in Applied Mathematics and Statistics. 

2 We are thus allowing the player its use of a randomized strategy. 


111 





112 M. A. GIRSHICK, S. KARLIN, AND H. L. ROYDEN 


and [2], and it is known that each decision procedure is dominated by a ‘“‘mono- 
tone’’ procedure—i.e., one for which ¢(z) = 0 for z < t, and ¢(z) = 1 for z > t. 

The global approach to our coin-tossing endeavor is to look at the whole 
series of tosses and calls as a single decision problem. Suppose for convenience 
that we agree at the beginning that our decision problem is to terminate after n 
tosses. Then a decision procedure consists of n functions {¢,}, the function ¢, 
giving the probability of calling heads on the ith toss if z; heads have appeared 
in the first (¢ — 1) tosses. It is clear that we get a complete class of decision 
procedures if we allow only those functions ¢; which provide a monotone pro- 
cedure for the ith toss, but it seems possible (and is in fact the case) that we 
might get a smaller complete class if we restricted ourselves to procedures ¢; 
where the separate components ¢; are related to one another in some fashion. 
We shall see that a complete class is formed by those strategies for which ¢;(2;) = 
6 for z; < t; and ¢,(z;) = 1 for z; > t;, with the numbers ¢; having the property 
that both ¢; and (i — ¢;) are increasing functions of 7. If we impose suitable 
restrictions on ¢,(t;), then we get not only a complete but also an admissible 
class. 

Aside from the many straightforward statistical problems involving multistage 
decisions, there arises in inventory analysis an important class of examples where 
repeated decisions must be made in the face of uncertainty. Suppose the dis- 
tribution of demand of a commodity has a known parametric form with unknown 
parameter. This is a very common assumption. Decisions must be made period- 
ically (the first of each month, for example) for ordering a certain amount of 
the commodity to have in stock in order to meet demand. Certain costs are 
incurred for storage, for purchase of the commodity, and for not having enough 
on hand to satisfy demand. At the same time, the exact distribution is not 
known, and more information accumulates about these distributions as the 
various periods roll by. Here again we are faced with a multistage statistical 
decision problem of considerable importance, and it is of value to know how the 
decisions over the periods should be related to form an admissible procedure. 

Many other examples of the above type can be cited which involve making 
decisions over several stages where more information of the uncertainty evolves 
with time. This investigation represents a first attempt in analyzing the relation- 
ship of the various decisions in the several stages from the point of view of 
statistical decision theory. 

In the next section we describe a general class of games of this type and 
give theorems regarding complete and admissible classes. 


2. Description of a class of games and decision procedures for them. We 
consider games of the following sort: Nature chooses a point w in an interval Q 
of the real line (or is in an unknown state specified by w). Each play of the game 
consists of the player choosing one of two actions and observing a random 
rariable x whose cumulative distribution is of the exponential class, i.e., is 
given by 





MULTISTAGE STATISTICAL DECISION PROCEDURES 


(1) P(t |) = B(w) | e dult), 


where 8(w) > 0 for w ¢ 2 and u(t) in a o-finite measure defined on the real line. 
This family of distributions includes many well-known examples such as the 
Normal with known variance, Poisson, Gamma, and Binomial. 

In most games of this sort which arise in practice the loss on a play is given 
by a function I,(x) of the action »v taken by the player and the outcome z of the 
observation of the random variable. However, we shall only be interested in the 
expected loss 


Lo) = / L(x) aP(s | «), 


and it is on this that we place our restrictions. 

In the present article we consider only the case of two actions and require that 
there be a point wo such that Li(w) < Le(w) for w < w and L,(w) > Le(w) for 
w > w. This corresponds essentially to the usual one-sided statistical testing 
hypothesis. The loss functions L,(w) are henceforth assumed to be sufficiently 
regular to insure the existence of all the integrals involving them that we will 
have occasion to consider. Furthermore, we assume that L,; — Lz has at most a 
countable number of discontinuities of the first kind. This last assumption is 
useful in connection with Theorem 2. 

If we take n observations of a random variable with distribution (1), then 
their sum z, is a sufficient statistic [1] and has the cumulative distribution 


2) P,(2, | w) = lato” [ e* dun(t), 

— 
where u,(¢t) is the convolution of « with itself n times. Thus, if we look at the 
(n + 1)-st decision by itself on the basis of the first n observations, a decision 
procedure consists of specifying the probability ¢,(z,) with which we take ac- 
tion 1, having observed z, . The risk on this play becomes 


~ 


(3) palgn | w) = [8(w)]" | elon (2n)Li(w) + (1 — on(2n))(Le(w)] dun (zn). 

Various aspects of this fixed sample size game have been considered in [2] 
and a minimal complete class of decision procedures for this game is the class of 
monotone procedures: those of the form ¢,(z) = 0 for ¢ < ¢, and ¢,(z) = 1 
for z > t, . We speak of t, as the “critical number’’ for ¢, . If u, bas a discon- 
tinuity at ¢, , then z, = ¢, occurs with positive probability, and the value ¢,(t,) 
-becomes important. In this case we shall refer to g,(t,) as the randomization 
at t, . 

If we now take a multistage view of the first n plays of the game, a decision 





114 M. A. GIRSHICK, S. KARLIN, AND H. L. ROYDEN 


procedure becomes a set ¢ = {¢;} 0 consisting of the procedures for each play, 
and the risk becomes 


(4) p(y |w) = > pily: | w), 


t=0 
where p; is given by (3). 

Thus all of the decision procedures which we shall consider from now on 
are completely specified by the critical numbers t; and the randomizations 
A; = ¢;(t;) at them. 

A complete class of decision procedures is obtained if we restrict each ¢; 
to be a monotone procedure. Two strategies are said to be equivalent if they 
give the same value to the risk function (4) for each value of w. We shall not 
regard equivalent strategies as distinct, although the functions {¢;} which 
specify them may be different. For example, in the coin-tossing game, a strategy 
which tells us to call heads on the fourth try if more than (14) heads have oc- 
curred on the first three plays is equivalent to one which tells us to call heads 
on the fourth if two or more heads have occurred in the first three plays. In 
case the measure dy in (1) is atomless, two strategies with the same critical 
numbers are equivalent, and the randomizations are irrelevant. The specifica- 
tion becomes unique if we restrict t; to the spectrum of du; and require ¢;(t;) = 1 
unless ¢; is an atom of dy; . 


3. The Principal theorems. We recall that the spectrum of a random variable 
(or of its distribution) is the set of those points z with the property that every 
open interval containing z is assigned positive probability. We define the range 
I of a random variable (or of its distribution) as the convex hull of its spectrum. 
I is automatically a closed interval, which we shall denote by [a, b] where a may 
be —o and b may be +. 

THEOREM |. Let F be an a priori probability measure on Q with sets of positive 
measure above and below w. If s = (¢1, --- , on) 1s @ Bayes procedure against F 
with critical numbers (t,, --- , tn), then t; — ti. must be an interior point of T 
fori = 2,---,n. 

In this theorem ~ — « and (—«) — (—~@) are always to be taken as in- 
terior points of T. The necessary restrictions placed on the relation between 1, 
and t;_, when a strategy is Bayes are given below for the four principal members 
of the exponential family. 

1) Binomial foe i >t > hw, 

2) Poisson i > ts. 

3) Gamma i: > te; 

4) Normal with known o t; can be anything. 

We defer the proof of this theorem until Section 5. 

As a corollary of Theorem 1 we have the following description of a complete 
class of strategies. 

THEOREM 2. Let 8 be the class of decision procedures whose critical numbers satisfy 
t; — ti. € T and whose randomizations have the property that if t; — t:.1 is an end- 





MULTISTAGE STATISTICAL DECISION PROCEDURES 115 


point of T, and gin(tin) > 0, then ¢i(t;) = 1. Then § is an essentially complete 
class. 

The question which arises at this point is whether or not the decision pro- 
cedures of class § are all admissible. For an important class of distributions of the 
exponential family, we are able to say that they are. These are the distributions 
for which the natural range 2 of w is open, where by the natural range of w we 
mean the set of w for which 


[ &* ano 


is finite. We shall use c and d to denote the endpoints of the interval. 
_  THeorem 5. If the natural range of w is open, then all the strategies of class $ are 
admissible. 

On the basis of these theorems we can now describe a complete and admissible 
class of procedures for some simple examples. In the example of the biased coin, 
the natural range of w is’ (—_~, ), and so the class $ is both complete and 
admissible. Thus one’s strategy for such a game should depend only on the 
number of heads and tails that have occurred; and if \;; is the probability of 
taking action 1 (calling heads) when 7 heads and j tails have been observed, then 
this strategy is in $ if and only if \;; > O implies that \;,; and \;; are all equal 
to one for 7’ > i and j’ < j. This criterion can be expressed loosely as follows: 

If on a given play the player calls heads with a nonzero probability, and if a 
head occurs on that play, then the player must call heads on the following play. 
Thus this criterion seems to be one of consistency. 

In the case of a Poisson distribution, if \;, is the probability of betting on a 
“successful” outcome on the nth play with 7 successes having been observed, 
then a procedure is in $ if and only if \;, < 1 implies that \,;,, = 0 forall n’ > n. 

For a normally distributed variable, however, we have T = (—, «), 
and so the single-stage procedures can be related to one another in a completely 
arbitrary fashion, and the resulting multistage procedure will still be admissible. 
This special result was obtained independently by H. Rubin. 


4. Preliminary lemmas. This section is devoted to establishing the funda- 
mental lemmas needed throughout the sequel. 


Lemma 1. If h(w) changes sign at most once and wo is a change point, then 
o(z) =| e*h(w) dH(w) with dH(w) = 0 


has at most one zero, counting multipliciiy, provided H(w) does not concentrate its 
measure fully in the set of zeros of h(w) or wo. 

ReMaRK. A point is called a change point of h(w) if h(w’)h(w) S O for w’ S 
wo S w, w’ ~ w, with inequality for at least one choice of w’ and w. 


3 If p is the probability of heads, then w = log [p/(1 — p)]. 





116 M. A. GIRSHICK, S. KARLIN, AND H. L. ROYDEN 


Proor. Consider the relation 


- 


- [g(x)e""] = | "(ey — wo)h(w) dH(w). 


As (w — w)h(w) has only one sign since both (w — wo) and h(w) change signs at 
the same point, we deduce by virtue of the hypothesis that d/dz)[g(x)e **°| 
is strictly of one sign, and hence g(x)e **° is strictly monotone. This implies that 
g(x) can vanish at most once, counting multiplicities. If H has positive measure 
in both intervals (— ©, wo) and (wo, +), then there exists precisely one zero. 
This is easy to see by letting x tend to infinity and analyzing the rate of growth 
of the integrand. 

On further careful examination of the proof of Lemma 1, we notice that if 
g(x) possesses one change of sign, then both g(x) and h(w) change signs in the 
same directions as their respective arguments increase. 

Coro.uary. If 


nie) a / e[L(w) — Lo(w)] dF(e), 


where F does not concentrate fully at wo , then (x) vanishes at most once, and if F 
has measure in both (— ©, wo) and (wo, ~), then A(x) vanishes precisely once. 
> ’ ss ) e 
Lemma 2. If s; — s does not belong to T, then B(w)e”** *” is monotonic. (In 
particular, provided yu has at least two points in its spectrum if 8; — 8 & a, then 
B(w)e****” is strictly decreasing; if 3; — 8 = b, then B(w)e*"**” is strictly in- 
creasing.) 


Proor. Consider the function 


B(w)e" "1 oe B(w)e” a ere et i 1 y 
w(z—y) m(w) 
€ dy(zx) 


We obtain 
m'(a) = [ eP%@—y) due) 20 f ya, 


and thus m(w) is monotonic increasing, whence 6(w)e““" is decreasing. A 
similar argument applies to the case where y 2 b. 

The next lemma will be useful in determining the admissible strategies of the 
multistage decision problem. 

Lemma 3. Let the natural range Q be open. If x is interior to T, then B(w)e”* — 0 
as w — the end points of Q. 


Suppose first that Q = (—«, ~), Let A = {€|& > x + e}. Since ze int T 
for e sufficiently small, uA > 0. 


Pt yada ig oan ier pe or gf 
y en t-*) dp(é) / ev t-2) du(t) [, e”* du(é) d 





MULTISTAGE STATISTICAL DECISION PROCEDURES 117 


As w > +, the last quantity — 0. A similar argument proves the assertion 
forw— —o., 


Suppose 2 = (a, b) where, say, 6 is finite. Form the ratio 


| e* due) 
[oe au) 


for w ¢ 2. As w — b, the ratio tends to e”. But as w — b, fe" du(t) — « by 
virtue of the fact that @ is open. Hence fe““~” dyu(£) must tend to ~ as w — b 
if the ratio is to tend to the finite limit e” at b. A similar argument at the point 
a proves the assertion, whenever a is finite. 

It should be noticed that if an endpoint of Q is infinite, then the requirement 
that x ¢ int T is essential, whereas if the endpoint is finite, this is not needed. 
If 2 is not open then for every z, e**8(w) need not converge to zero as w tends to 
the endpoints of ©. 

The following examples will illustrate these last facts: 

(i) If du(x) = ere. then Q = (— , ©) is open. 
(2° dx 2>¢ a 2 0, 
(ii) If du(x) = { 
\0 zs 0, 
then 2 = (— «, 0) is open. Note that for z > 0 interior to I, then e “8(w) = 
C | w\|*e"* +0 as w > — &, while for z = 0, e*B(w) = Clw|“ > +. 

(iii) If d(x) = e '*' dx, then Q = (—1, 1). 

(iv) If du(xz) = fe''/(1 + 2’)], then 2 = (—1, 1) is not open. 

(v) Any exponential distribution where the spectrum of yw is a compact 
set has Q = (—*, «) and thus falls in the domain of validity of Lemma 3. 


5. Bayes procedures and an essentially complete class. A complete class of 
procedures generally too large is easy to determine. In fact, the class of pro- 
cedures s = (g,, --: , én), Where ¢; is a monotone procedure for the ith stage, 
is complete. A monotone procedure is determined by a critical number ¢; such 
that ¢:(z;) = 1 for z; < t; and ¢,(z;) = 0 for z; > t; (with possible randomiza- 
tion relevant on ¢; if u{z;} > 0). 

At each stage the risk function is 


pw, 6) = | (L(woile) + Lalw)ll — os(ed)) [Bude dus. 


This is the usual risk function for a single-stage, one-sided decision problem with 
random variable z;, so any procedure which is not monotone for the ith stage 
can be improved upon by inserting a monotone procedure at the ith stage. 
(See [2].) Our object here for the multistage decision problem is to obtain an 
essentially minimal complete class of procedures. To do this, we first character- 
ize the Bayes solutions. 





118 M. A. GIRSHICK, 8. KARLIN, AND H. L. ROYDEN 


A procedure s is Bayes against an a priori distribution F(w) on Q if 
p(F, s) = min, p(F,s), where p(F,s) = > pi(w, oi) dF (w). 
Interchanging the order of integration 
" , 
(Fs) = >> f (ed dus [ e'*ULa(o) — La(w)(B(w)I dF(w) + CCP), 
1 
where C(F) is a function of F which does not involve s. It is clear now what s 


must be in order to minimize this expression. The optimal procedure s = 
(1, *** , dn) is 


1, if / e[Li(w) — Le(w)][8(o))* dF(w) < 0, 
$:(z:) = { 


lo, if / e(L(@) — La(w))[8(w)]* dF(w) > 0. 


g(2i) = f exw) — Lalu) 1(6(W)I aPC) 


changes sign once, since Li(w) — Le(w) changes sign once, and it changes in the 
same order as L;(w) — L2(w). (This assumes that F(w) has sets of positive meas- 
ure both above and below w). The reader can easily complete the elementary 
analysis necessary when this is not satisfied.) 

Thus there exists a ¢; such that 


(1, for 2: <t;, 
0, for a> t; . 


THEOREM 1. Let F be an a priori probability measure on Q with sets of positive 
measure above and below w. If s = (ti, --- , bn) is a Bayes procedure against F 
with critical numbers (t,, --- , tn), then t; — ti. must be an interior point of T 
fort = 2,---,n. 

Proor or THEOREM |: 

Let 2 = (c, d), where d may be + ~ and c may be — ~. Fori = 2 


d 
(5) / e(L,(w) — Le(w)]{8(w)]* dF(w) = 0, 


and 
d 
(6) [ &*“Ua(w) — La()]IB(W)I"* aP(w) = 0. 


(5) and (6) can be written as 


#9 


d 
(7) | eL,(w)B'(w) dF(w) - | eLa(w) Bw) dF(w), 


“90 





MULTISTAGE STATISTICAL DECISION PROCEDURES 


d #0 
[este B) af («) = [cL (w)8(w) dP). 
@o “ec 
Suppose ¢; — t;_; is not interior to T, say, 2b. Then by Lemma 2, 


d d : 
[ e'**L,(w)B'(w) dF (w) / eH" Bae" L1(w) BY (w) dF (w) 


“wo 


a 
e-B() | e*-"*T,.(w) BY *(w) dF(w) 
wo 


= eft *-0908 (is) f e*-*L.(w) 8 "(w) dF (w) 
[: #5)" Bae" (cw) 8” “(w) dF(w) 


[U &*Lae) 8)aP(), 


which is impossible by virtue of (7). Thus the theorem is proved. 

THEOREM 2. Let § be the class of decision procedures whose critical numbers 
satisfy ¢; — t;., e T and whose randomizations have the property that if t; — ti. 
is an endpoint of I, and ¢(t;1) > 0, then g(t;) = 1. Then § is an essentially 
complete class. 

Proor. Let a , we , w; --- represent a dense set of w; , not including w». , which 
includes all other discontinuities of L,(w) — I(w). Considering only a, , w, 

- ,w» as the pure states of nature, and in view of Theorem 1, we know that all 
Bayes procedures have the form as described in the theorem with t; — t;. in the 
interior of T. Thus these procedures constitute an essentially complete class 
when the states of nature are w = a,(i = 1, --- , m). (See [1].) Hence if s is any 
procedure, there exists a procedure s‘”” of the type indicated in the theorem where 


p(wi, ¢) = p(w, ”) ; 1, ---,m, 


and hence 


© [8w)V1Lalw.) — Lalwd] f ees — 7) dus(2) & 0. 


° “err 0 ° 

By the usual diagonal process we can select a limit strategy s from s”, that is 
0 ° ° 

¢i(x) converges to ¢:(x) for every x, where r is a subsequence of m. It is easy 

to see that for each 7, 


p(w, 8) = p(w, 8°). 


Since w; are dense in Q, it follows that the above inequality can be extended to 
hold for all w. Moreover, 2ach s” is in $, and §$ is clearly closed with respect to 
pointwise convergence. Thus the limit procedure s° must also be in 8. 





120 M. A. GIRSHICK, S. KARLIN, AND H. L. ROYDEN 


6. Admissibility. In the preceding section an essentially complete class of 
procedures was determined. It is now our purpose to investigate the admissibility 
of such procedures. Throughout this section we assume that 2 is open. The 
analysis will be divided into two cases: (a) where du(x) is atomless, and (b) 
where du(x) can possess atoms. 

Case a. The o-finite measure du(x) is atomless. For this situation the collection 
of procedures in $ are uniquely characterized by the critical values (t; , --- , tn), 
where 

ca. 4 
gilt) = 4 
\0, 22>, 
and any randomization at ¢; is of no consequence in terms of the expected risk. 
Consequently, a strategy s and the collection of critical numbers (tf; , fg, «++ , tn) 
shall be referred to interchangeably. 

THEOREM 3. All monotone strategies s = (t, te, «++, tn) with tis — hE n T 
fork = 1, 2,---,n — 1 are admissable. 

Proor. Suppose not. Then for some s = (t,, ---, ts) with ji: -— & eT, 
there exists a strategy s’ which is better. If s’ is not a monotone strategy with 
tex: — t, € T, then by the completeness of this class of strategies, there exists a 
monotone strategy s* = (tf , --- , ) with ¢, — ¢f ¢« T better than s’. 

By definition, 


p(w, s) = > fa(w)ite™ | {geu(ze)La(w) + (1 — gn (ze))Le(w) } due(ze), 


n 


ps, 8) — p(w, s*) = & (atw)I* fe Gex — f)1La(w) — La(o)] dua(a) 


1 


= [Ln(o) — Lalw)] 2: (ato) fe dus & 0. 


The theorem will be proved if it can be shown that this holds true only if t, 
for all k, or equivalently, p(s, w) = p(s*, w) for all w. 

For the sake of definiteness suppose that f, S f . Let i, 2, is, 
be defined by the following relation. 


i,k, --- S Hh, t, < &, 


= ti, ti, > tt, 


> t-1, ti, > &, 


= to-1, ti, < t% 


* * 
Shins t, < &,, 
and so on. 
The strict inequality signs above mean that the measure yu, has positive measure 
° ° + . * e 
in the half-open interval between ¢, and t, . Otherwise ¢, and f, define the same 
test and are to be taken equal. 





MULTISTAGE STATISTICAL DECISION PROCEDURES 121 


Let 2 = (c, d), where c or d or both may be infinite. The first step is to show 
that as w — d, 
Lees | 
B(w)? [ 7 dyu;(t) | 
t; 


2 


7 
Ri, = 


="; tht ania a 0, 
B(w)* if e* dns,(| 


fori; + 1 Sj S i;.1fd > O, then for w near d, 


j 


Me , 
\ | 4 @ jw 
(9) | é du,(t)| s c,e'** ; 
i t* 


* 
"7 


and as t t;, , we get 


1? 
- 


te, 
(10) | e” du:(t) = eo(€) exp (t;, + dw > 0 
t 


‘1 


where « > 0, and c(e) is a constant which depends only on ¢ and not on w. 
This is certainly true for ¢ sufficiently small, since the half-open interval (t,, , {7,) 
has positive measure by assumption. 

Since t;, < tf and t;,41 2 41 tina — bi, > tf 41 a , . But both t),41 — &, 
and tt, 41 — t) are in Tr. Thus for « > 0 sufficiently small, ti,4: — ti, — «€ will 
be in the interior of T’. In fact, two sufficiently small numbers « > 0, » > 0 
can be found such that ¢;,,: — t;, — ¢ is in the interior of [fore —yn Se S 
« + 7. In the bound (10), above, choose « > 0 sufficiently small so that not only 
is (10) satisfied, but « also has the property just indicated. Combining (9) and 
(10), we have 


(11) Ri, s ¢3(€)B(w)* exp {(t; — li, — ew}; 


c3(e) is independent of w, « > 0, and t;,4, — ti, — e isin intT fore—y S 
e Se+ 7. 

Since t. — 1 ¢€ T,k = i, + 2, --- ,j, for m near zero and of the correct sign, 
t, — ty: — m will be in the interior of T. Choose each 7, sufficiently small in 
absolute value so that >>}, 42 |m| < 9. Then the ¢ defined by & = « — )oi.so™ 
satisfies « — n S « S € + 7». Splitting « up into e + )_ , the upper bound 
(11) can be rewritten as Ri, S ¢s(€)8(w) exp { (tia. — i, — €)o} []},428(e) 
exp {(th — ti — m)}. Simce t,41 — ti, — ¢ is intT and & — {1 — m™ are 
int T for k = i, + 2, --- ,j, each term involving w in the above bound — 0 as 
w — d by Lemma 3. Thus R;, — 0 as w—d > Oforj = i, + 1, --- , ig. When 
d < 0, a similar proof using ¢ , 7, instead of t;, t;, and ff,4. — 7, + e for 
e > 0 instead of t;,., — 4, — ¢€ in the upper bound gives the same result. 

An analogous proof shows that Ri, — 0 fori, +157 S i: asw—->c.Ina 
similar manner, if Rj, is defined by 


ato)’ | [et dul?) 
B(w)"* if. e'* du;,(t) | 





122 M. A. GIRSHICK, S. KARLIN, AND H. L. ROYDEN 


then Ri, — Oasw—c,dfori; + 1 Sj S %. Similar results are obtained for 
is » Ri, , ete. 

What this shows is that as w tends to the end-points of ©, the i term is of a 
larger order of magnitude than the later terms up to and including 7;. By the 
same reasoning, the i; term is of larger order of magnitude than the later terms 
up to and including 7; , and so on. Thus as w — ¢ or d, the 7, term dominates 
the remaining finite number of terms. Since the first 7; terms have the same sign, 
the sign of 


n te 
> (alt fe dus 
1 .. 
is the same as the sign of 


ts, 

24,0 
[ e dui, ’ 
t 


‘1 
which is one sign when either w — c or when w — d. But the sign of L;(w) — Le(w) 
for w near ¢ is negative while for w near d it is positive. Thus for w near c, 
p(w, 8) — p(w, s*) is of one sign, and for w near d it is of the opposite sign. But 
this contradicts the fact that p(w, s) — p(w, s*) = 0 for all w. Thus the assump- 
tion that there exists an i, such that t;, < ¢?, is incorrect, and the following 
must hold: 


h= tt, h=#,°°°,t, =&,. 


The first 2, terms in p(w, s) — p(w, s*) are zero. The argument can be repeated 


again on i;, is, etc. The conclusion is that t, = tf for all k. Thus s = s* and 
p(w, 8) = p(w, s*). This completes the proof of the theorem. 
Case b. The restriction that du(z) is atomless is removed. 

THEOREM 4. Without any ass :mptions on u, if s is a procedure in § wheret; — t 
is interior to T for every j, then s is admissible. 

Proor: Let s* denote a strategy of S$ which improves s. The same domination 
argument used in Theorem 3 when ¢t; — ¢;-; is interior to IT shows that 
uj(t;)(Ay — AZ) = 0, and thus p(s, w) = p(s*, w). This therefore implies that 
s* is admissible. 

THEeoreM 5. If the natural range Q of w is open, then each decision procedure in 
$ is admissible. 

Proor. Theorem 4 already disposes of the case in which the differences of 
all critical numbers are interior to [. For the sake of simplicity of exposition, 
we shall assume that the right-hand endpoint b of T is infinite. The case in which 
both endpoints are finite has a completely analogous but tedious proof. Let s 
be a procedure in §, and let s* be a procedure which betters s. Since $ is complete, 
we may assume that s* is in 8. 

The critical numbers of s are denoted as previously by ¢; , and the respective 
randomizations at t; by \;. The notation t; > ¢} for this case shall mean either 
t; > tf or t; = tf and \; > dF where the last possibility only has relevance 





MULTISTAGE STATISTICAL DECISION PROCEDURES 


when y;{t;} > 0. Let the indices i,, i2, i3, --- be defined as in the proof of 
on “ . e ° a 
Theorem 3, subject to the new interpretation of t; 2 ¢; . 

Pursuing a similar analysis as in Theorem 3, it remains only to show that 


B(w)? |e, — gf) dy; 
Bh, 





B(w)" / e'“ (yi, = ¢:,) dui, 


tends to zero as w tends to the end points of Q for all 7 = 7; + 1, --- , is. The 
need to represent the expression for Rj, in terms of ¢;, ¢) , ¢;, and 7, is that 
the randomization of ¢;, ¢? , etc., at the critical numbers may contribute to 
the integral. 

The only situation necessitating an additional argument distinct from that 
used in the proof of Theorem 3 arises when t;, = ¢, with \;, < AT, . The ex- 
pression Rj, becomes 


Bw)? [ —¢}) dy; | 


St at, aaa cereneieiineartal 
CB(w)"*yi, {ti} ete 


For the term j(t; + 1 S 7 S i) if t; = ti, + GY — tia, then the same must 
hold for t} , since t; => t} ; and A; = A} = 1, since s and s* both belong to 8S. 
Consequently t; = tf? . Therefore we may assume that t; > § = t:, + (j — a)a. 
If also t? > t;, + (j — t)a = tf, + Gj — ida, then the argument can be carried 
through as in Theorem 5, so let us further suppose that ? = t;, + (j — a)a. 
This requires that \¥ = 1. It is now asserted that 


(12) Blu) ete f e“(e; — gf) du; 


tends to zero as w — the endpoints of 2, where the range of integration of t 
is a region which is included in the half-open interval t; => ¢t > ti, + (j — a)a. 
To establish this last fact, we distinguish two cases according as uifa} > 0 
or pia} = 0. 
I. wia} > 0: Note that 


(t—t;,) 


‘ : 4,/¢ 
Oe see ee go, 
[miaje*} ~ 7 
as w tends to the left-hand endpoint of 2 for t > t;, + ar(r = 7 — 7%). Of course, 
for each ¢ in the same region by Lemma 3, 


B(w)” “te **1'* 4 0. 


As dy is integrable on any closed interval contained in the spectrum of yu, which 
may include an endpoint if it belongs to the point spectrum, the expression in 
(11) on account of the Lebesgue convergence theorem tends to zero as w — c¢ 
where 2 = (c,d). If d < « the argument is the same as at c. If d = o, then 





124 M. A. GIRSHICK, S. KARLIN, AND H. L. ROYDEN 


8(w) vanishes faster than any exponential as w — d, and the result in (12) is 
immediately clear. 
II. wja} = 0. The formula (12) is bounded by 


2 


[B(w)]7~* [ gy dy;(t), 


(ti, +G—i))a]+ 


which on some simple calculation reduces to 


i] y~ du;,(t + t,,). 
a+ 


Ife = —* asw— —~, then it is easy to see that 


x 
| e du(t + t;,) 3 0. 
a+ 
If ¢ is finite, then the argument of showing that (12) tends to zero is similar to 
that of J above. The reasoning involved at the other endpoint d is similar and 
is omitted. 

Having thus established the assertion of (12) the remainder of the reasoning 
proceeds as in the proof of Theorem 3. 


7. Converse of Theorem 1. 

THEeoreM 6. If a decision procedure s has critical numbers t; with t; — tj 
interior to T, then s is Bayes against an a priori distribution fully concentrated at 
n + 1 points, n being the total number of plays of the game. 

Proor. Let us consider a distribution F whose spectrum is the n + | point 
w; . In order for s to be Bayes against F, it is necessary and sufficient that 


n+l 


(13) D e*[B(w) (Li — Le)(wax = 0 

t= 
The solutions y; = [a,(L, — Le)(w;)] of these equations are proportional to the 
cofactors of the last row of 


a ‘ — 

e°?''B(we), +, Ot B(wnat) 
ate a2 2 

e928" (we) +++ , et 88° (ins) 
rot onsit 

e°?""B"(we) --- , Oot ™B (wnat) 


a2 2° 5 Gn+i o 


The values w and w. are chosen fixed subject only to the inequalities w, < 
Wo < We. 

The remaining w;, i = 3, --- , ~ + 1, are chosen near ¢ and d appropriately 
so that 8(w;) exp {w;(t; — t;:)} are all near zero but each of smaller order of 
magnitude. In fact, each w; is chosen successively, each closer to c or d so that 
B(ws) exp { w(t; — t;1)} isof largerorderof magnitude than B(w,s) exp {ws(t; — t;1)}, 
which in turn is of larger order of magnitude than B(ws) exp |ws(t; — t;-1)}, ete. 





MULTISTAGE STATISTICAL DECISION PROCEDURES 125 


This careful selection of w; implies that the cofactor of a is dominated by 
(—1) exp {weta} exp {star} -- + exp {wnsiti}[B"(ws)B(ws)" - - - B(was:)] and 
thus is negative, and in a similar manner we can infer that the cofactor of az is 
positive. Thus according to our choice of w and w:, it follows that a, > 0, 
ae > 0. 

The sign of the cofactors a,(r = 3, + - - ,m + 1) is independent of the choice 
of w;, provided only that 8(w)exp {w,(t; — t;.)} has the right order of magni- 
tude as described above. The exact sign of a, is 

| g@itn—i evzin-1 
sign of cofactor a, = (—1)’ 


e*'*§B(w) e*?""B(w) 


The w; are then selected near either c or d, so that they produce the correct mag- 
nitude and so that a; > 0. This entails choosing w, near c and d alternately. 
The Bayes distribution is of the form: place the mass A; = a; / da; at w; ; then 


e“[8(w)}(L, — L2)(w) dF(w) = 0 


reduces to the equations (13), and hence the Bayes strategy for this F is the 
given s. This completes the proof. 

The above proof was suggested by J. Pratt and replaces a more cumbersome 
construction by the authors, which showed also that each Bayes strategy was 
Bayes against a strategy fully concentrated at only n points. 


REFERENCES 


{1} D. Buackwe.t anp M. A. Girsuick, Theory of Games of Statistical Decisions, John 
Wiley and Sons, Inc., New York, 1954. 

{2} S. Karuin anv H. Rusin, The theory of decision procedures for distributions with 
monotone likelihood ratio,’”’ submitted to Ann. Math. Stat. 





ON A CHARACTERIZATION OF THE NORMAL DISTRIBUTION 
FROM PROPERTIES OF SUITABLE LINEAR STATISTICS 


By R. G. Lana 
Indian Statistical Institute 


1. Introduction and Summary. In recent years, problems related to the 
characterization of the normal distribution from the property of stochastic 
independence of linear functions of independent random variables have been 
investigated by various authors. The most general result in this direction is 
that obtained independently by Darmois [4] and Skitovich [14], who proved 
that if there exist two linear functions 


X =>Riaa; and Y = >p1 bz; with a,b; = 0 


(j = 1, 2,---m), such that they are stochastically independent, where 2; , 
Z2,°**, X, are n independently (but not necessarily identically) distributed 
proper random variables, then each x; is normally distributed. Their methods 
of proof are similar in nature, both being based on the use of characteristic 
functions, without any assumption about the existence of moments. The same 
theorem has been proved independently by Basu [1], under the assumption 
that the random variables are identically and independently distributed and 
have finite moments of all orders. This result is also obtained independently by 
Lukacs and King [11], under the assumption that the random variables are in- 
dependently (but not necessarily identically) distributed, each having finite 
moments up to order n, and Ly Linnik [10], under the assumption that the 
random variables have only finite variances. The special case of this theorem 
for n = 2 was proved earlier independently by Kac [6], Gnedenko [5], and 
Darmois [3], without any assumption about the existence of moments. 

Thus we see that the problems on the consequences of stochastic independ- 
ence of linear statistics have been exhaustively studied. Hence the question 
that naturally arises is whether similar investigations into the distribution 
laws of the random variables are possible under the assumption of a suitable 
type of stochastic dependence of the linear statistics. In this direction, the 
author [8] has derived a characterization of the stable law with finite expecta- 
tion from the property of linearity of regression of one linear statistic on the 
other for the case n = 2. The author [7] has also obtained some characteriza- 
tions of the normal distribution from the consequence of the linearity of the 
multiple regression of one random variable on several others, when the vari- 
ables have the linear structural setup as in the case of the bi-factor theory of 
Spearman. 

For the formulation of the problem investigated in the present paper, we re- 
quire a precise definition of the terms conditional distribution, linearity of re- 


Received December 30, 1955; revised July 11, 1956. 
126 





CHARACTERIZATION OF NORMAL DISTRIBUTION 127 


gression, and homoscedasticity. Let F(z, y) and Fo(x) denote respectively the 
distribution function of the two-dimensional random variable (x, y) and the 
marginal distribution function of x. Then we define the conditional distribution 
function of y for fixed x by F.(y) so that it satisfies the relation 


(1.1) | F,(y) dFo(é) = F(z, y). 


In the present investigation, the following assumptions on the distributions 
of the random variables will be made: 

Assumption 1. The conditional distribution of y for fixed zx as defined in 
(1.1) is assumed to exist, wherever needed. 

Assumption 2. Each of the random variables concerned has a finite second 
moment. This assumption allows us to take derivatives of the characteristic 
functions of the corresponding random variables up to and including the second 
order. 

Assumption 3. Without any loss of generality in the proof, it is assumed 
that the expectation of each of the random variables concerned is equal to zero. 

The role of these assumptions is to ensure the existence of the expectation 
and the variance of the conditional distribution of y for fixed x, which may be 
defined respectively as 


(1.2) E.(y) = [ y aF.(y), 


V.(y) | (yy — E,(y)}? dF -(y) 
_ S,(y) a {E.(y)}’, 


ae 


S:(y) = | y aF,{y). 


In this case, the regression of y on z is said to be linear, if the relation 
(1.4) Ey) = Bx 


is satisfied for all z, except for a set of probability measure zero, as the expec- 
tations of both the random variables x and y are already assumed to be zero. 
The constant 8 in equation (1.4) is called the coefficient of regression of y on z. 
Similarly the conditional distribution of y for fixed x is said to be homoscedastic, 
if the conditional variance V,(y) as defined in (1.3) is a constant o} not involv- 
ing x. Thus if the regression of y on z is linear and given by xz and the condi- 
tional variance of y for fixed x is a constant 09 , we have the relation 


(1.5) S.({y) = oo + Bx 


to be satisfied for all z, in addition to the condition (1.4). For simplicity in 
notation, throughout the present paper we shall use the term the conditional 





128 R. G. LAHA 


distribution of y for fixed x is L.R.H. (8, o>), meaning thereby that the regres- 
sion of y on z is linear and given by zr and that the conditional variance of y 
for fixed x is oo , being free of x, i.e., equivalent to the statement that both the 
relations (1.4) and (1.5) are simultaneously satisfied. 

In the following sections we shall derive some characterizations of the nor- 
mal distribution from the property of linearity of regression and homoscedas- 
ticity of suitable linear statistics. The main theorem (Theorem 3.2) is given in 
Section 3. The proof of this theorem uses as a starting point a very simple set 
of necessary and sufficient conditions for the linearity of regression and homo- 
scedasticity (Lemma 2.1) and finally a theorem of Linnik (Lemma 2.3) on an 
analytical extension of Cramér’s theorem [2] on the normal law. Several im- 
portant corollaries are deduced in the subsequent section. 


2. Certain useful lemmas. We shall now give some important lemmas which 
are instrumental in the proof of the theorems. 

Lemma 2.1. (Necessary and sufficient conditions for the linearity of regres- 
sion and homoscedasticity). Let x and y be two proper random variables each 
having a finite variance. Then the necessary and sufficient conditions for the con- 
ditional distribution of y for fixed x to be L.R.H. (8, 0%) are that the equations 


dg(u, v) _ dg(u, 0) 
Ov vend du 


a ) ° od , 0) 
Jelu, >| = —o¢ud+s cee. 
vad) 


ov? de? 


are to be satisfied for all real u, where g(u, v) and g(u, 0) represent respectively the 
characteristic functions of the distribution of (x, y) and the marginal distribution 
of x. 

This lemma helps us in introducing the differential equation connecting the 
characteristic functions of the variables concerned and has been proved in- 
dependently by Rao [12] and Rothschild and Mourier [13]. 

Lemma 2.2. Let x and y be two proper random variables each having a finite 
variance. Then if the conditional distribution of y for fixed x is L.R.H. (8, a6), 
the conditional distribution of by(b ~ 0) for fixed ax(a ~ 0) is L.R.H. (8, 00°) 
where B’ = bB/a and oo = ba). 

LEMMA 2.3. Goer tical extension of Cramér’s theorem on the normal law). 
Let X,, X2,---, Xn be n independent proper random variables and let further 
y(t) denote the characteriatie function of the distribution of X ;(j = 1, 2,---, n). 
If now the functions ¢;(t) satisfy the equation 


n 


(2.3) II fe; 3%! = e%®, 


j=l 
for all real t in a certain neighbourhood |t| < 6, (6 > 0) of the origin, where a;’s 


are some positive numbers and Q(t) a quadratic polynomial in t, then each X; is 
normally distributed. 





CHARACTERIZATION OF NORMAL DISTRIBUTION 129 


Proor. We give below a short proof of this very interesting lemma which is 
due to Linnik [9]. 

Without any loss of generality in the proof, we can take the quadratic poly- 
nomial Q(t) to be of the form Q(t) = iat — 4¢ and work with the characteristic 
functions 6,;(t) of the symmetric random variables 2; = X; — Xj(j = 1,2, ---, 
n), where X; is distributed independently of X; and has the same distribution 
as X;. Then it can be easily shown that the characteristic functions 6,(t) sat- 
isfy the equation 


n 


(2.4) II {9()}%* =e" 


j=l 


for all real ¢ in a suitably chosen neighbourbood of the origin. Again noting 
that each of the characteristic functions 6,(t) is real, we have 


6;(t) | cos tx dF ;(x) 


x 


=l]-— 2 | sin’ = dF ;(z) 


< exp ¢ —2 / sin’ . dF ;(x)), 


whence, using the equation (2.4), we get 


(2.6) > a; | sin? = dF,(z) < 


j=l 


t 


5? 


thus yielding for every j, the inequality 


CSS ae s 

Lo tC 2a; 

holding for all real ¢ in a certain neighbourhood of the origin. Then by using 

Fatou’s theorem, it follows from (2.7) that the second moment of each z; exists. 
Next we shall show by induction that each zx; has finite moments of all orders. 

Let us suppose that each z; has finite moments up to an even order 2k. Then 

differentiating both sides of the equation (2.4) with respect to t, 2k times we get 


(2.8) Si(t) + So(t) + S:(t) = Ga(te”, 


where S,(t) contains all the derivatives of order 2k of the functions 6,(t), S2(t) 
contains all the derivatives of only odd order not greater than 2k — 1, S;(t) 
contains only derivatives of even order not greater than 2k — 2, and G(t) is a 
polynomial of degree 2k in t. We also note that S,(0) = 0. 

We should further note that for each term on the left hand side of (2.8) which, 
except for a constant coefficient, is a product of the derivatives of the func- 


1 The proof of this lemma is given in ‘A. A. Zinger and Yu. V. Linnik—On an analytical 
generalization of a theorem of Cramér and its application, Vestnik Leningrad Univ., Vol. 
10 (1955), pp. 51-56.” In this paper the authors have also given an alternative proof of 
Darmois-Skitovich theorem using this lemma. 





130 R. G. LAHA 


tions 6,(t), if p;- is the order of the derivative of 6;(¢) and q;, the corresponding 
power with which it appears, then 
(2.9) D> Pir Gir = 2k, 
where 6,(t) itself is to be considered as a derivative of the order zero. 
Now from (2.8) we get easily 
S (8) = (0) + S2(t) + Salt) os S;(0) 


i? t* 
(2.10) 
_ Gal ‘t) — Gy 
es = 

Then it is easy to verify that as t — 0, the expression on the right-hand side 
as well as each of the second and third summands on the left-hand side of Eq. 
(2.10) tends to a finite limit. Consequently [S,(t) — S,(0)|/f must also tend to 
a finite limit as ¢ tends to zero. 

Moreover, noting that 


/ II (9,(t)}*" 
(2.11) . = 2 reel 
S(t) = > a0 (t) a 


it can be shown from the above result that 


(2.12) Mie) = (0) 3 
Sha” ae = (—1)* a, [2 — dF (x 


= t j=l 


has a finite limit as ¢ — 0. Again proceeding in the same way as in (2.6) and 
using Fatou’s theorem, we prove that each xz; has a finite moment of order 
2k + 2, thus completing the induction. 

We shall next show that each 6,(¢) is an entire function of ¢. Without any 
loss of generality in the proof, we can, by making a suitable change of scale of 
the variables if necessary, take each of the indices a; 2 1. Now raising both 
sides of Eq. (2.4), to the power 2k, we differentiate 2k times with respect to ¢ 
and then put ¢ = 0, thus obtaining 

. kek 
(2.13) S¥(0) + S3(0) = (—1)" Sa bie : , 
Denoting by ux; the moment of the order 2k of the random variable x; , we 
have 


n 


(2.14) ST(0) = (—1)* D> 2ka; pn; , 


j=l 


. y@ s0 ° ° e« ° 
while S;(0) consists of terms, each containing the moments of even order (the 
order being at most 2k — 2) with a positive coefficient having the same sign 





CHARACTERIZATION OF NORMAL DISTRIBUTION 131 


(—1)* by virtue of (2.9). Then noting that a; = 1 and k* < e*k!, we get easily, 


using (2.13) and (2.14) together, the inequality 


- . (2k) 1 K*2" 
(2.15) Mag < ? Zhe; px; < ) ae 


j=l k! 


< (2k) 1(2e)*, 


whence it follows that each 6;(t) has a power series development about t = 0 
with a radius of convergence not less than 1/+/2e, the series representation 
being of the form 


(2.16) a(t) = D> (—1)* = #, 
k=0 (2k)! 

We now consider the behaviour of each of 6,;(t) for purely imaginary values 
of t. Substituting ¢ = w (v real) in (2.16) and then making the variable trans- 
formation w = v'(w = 0), we can easily verify that the functions £;(w) which 
reduce to 6,(t), if we put w = —?’, satisfy the equation 


n 


(2.17) II {&(w)}** = e” 


j=l 


and have the power series development 
E(w) = >ofo [(uoes)/(2k) tw 


about w = 0, the radius of convergence being not less than 1/2e and the coefficients 
being all positive. Now let w. > 0 be a point within the radius of convergence of 
each of the ¢;(w). Then taking »,(W) = £&;(wo + W)/£;(wo), it is easy to verify 


that the functions 7;(W) satisfy the equation 


n 


(2.18) II {ns(W)}*! = e”, 
j=l 
which is completely analogous to the equation (2.17) and has the power-series 
development 
ok = = (wo) W* 
nj(W) = a 
k=mo §;(Wo) k! 
in a certain neighbourhood of W = 0, the coefficients being all positive. 

Then proceeding exactly in the same way as above, we can show that the 
radius of convergence of each of the series development for 7;(W) is not less 
than 1/2e. Hence each £;(w), having no singularities for0 S w < wo+ 1/2e, has 
a power-series development with radius of convergence not less than wy + 1/2e 
£;(w) is a series with positive coefficients). This fact evidently leads to the 
conclusion that each of £;(w) and hence each of @;(t) is an entire function. The 
remainder of the proof is exactly similar to that of the theorem of Cramér [2]. 


3. The theorems. We are now in a position to prove the following theorems: 
THEOREM 3.1. Let (x;, yj) j = 1, 2, --+ , n be n independently (but not neces- 





132 R. G. LAHA 


sarily identically) distributed two-dimensional proper random variables each having 
a finite variance such that the conditional distribution of y; for fixed x; is L.R.H. 
(8;, ojo) forj = 1,2, ---,n. Let . = > jun a2; and Y = > j-1 b;y; be two 
linear functions with ajb; # 0 (j = 2, -++ , n), then the conditional distribution 
of Y for fixed X is always L.R.H. a5), whenever the relation 

bB, a be Be a bn Bn 


ay as Ay 
is satisfied. 

Proor. For convenience in procedure, let us substitute &; = a,z; and 9; = 
b;y; for 7 = 1, 2, --- , n. Then we can write X = > j-1 f;and Y = > j=1 nN; . 
and further using Lemma 2.2, we get that the conditional distribution of 7; for 
fixed £; is L.R.H. (8) , «35) where 8; = b,8;/a; and 035 = bjoj) ;j = 1,2, ---, 

Let ¢,(u, v) and ¢j(u, 0) denote respectively the characteristic functions of 
the distribution of (£; , »;) and the marginal distribution of &;(7 = 1, 2, --- , n) 
and similarly $(u, v) and @(u, 0), those of the distribution of (X, Y) and the 
marginal Teuietion of X respectively. 


Then we can write 


Mt. 


&(u, v) E\exp (tuX + wY)} 
E\exp (tu > &; + iv a: n;) | 
J 
7 II ¢;(U, v). 


j=l 
Again it is given that the conditional distribution of ; for fixed &; is L.R.H. 
(8; , ojo), for all j = 1, 2, --- , nm. Hence applying Lemma 2.1 and using the 
conditions (2.1) and (2.2), we get 


(3.2) Seite) 3’ dg;(u, 0) 
v=) 


= {| 
av aa 


do;(u, v) 
(3.3) ee) | = — 055 9;(U, 0) + B; 
ov- vel) 


ro d-g;(u, 0) 
du? 


Now differentiating both sides of (3.1) with respect to v, r times (r = 1, 2) 
and then putting v = 0 and using the relations (3.2) and (3.3), we get 


(3.4) ou 0) = > 6; de,(u, 0) II g,(u, 0), 
v=o 


ov ae kei 


eel ia _— $i\ (u, 0) S a0 


ov- j=l j=l 


+ + op Ses “- 0) Il¢ o.(u, O 


kgsj 


+ a Ae 8 0) dex(u, 0) IT e:(u, 0). 


du Lt j,k 





CHARACTERIZATION OF NORMAL DISTRIBUTION 


Again putting v = 0 on both sides of (3.1) and then differentiating both sides 
with respect to u, r times (r = 1, 2), we get 


(3.6) Re, 0), SSS aT de 0), 


du jut au kei 


Ie u, 0) * > Fesu, 0) II glu, 0) 


du es du? ey, 


‘ > dgj(u,0) dg,(u, 0) Il gi(u, 0). 


jak du du igjJ 


, , 


Now under the conditions of the theorem, we have 8; = Bs = --- = 8, = B. 
Then substituting this in (3.4) and (3.5) and finally comparing with (3.6) and 
(3.7), we get 
0b(u, v) ke d(u, 0) 

‘ | yh , du ' 


an 


yblu, 7 _ ro ms u,O0 
: - | = —(u, 0) 2, Oz te. “ 


Ov du 


Then the proof of the theorem follows at once using Lemma 2.1 to (3.8) and 
(5.9). 


From the above theorem, it follows easily that if there exist two linear func- 
tions X = > ju a«;and Y = >a ba;, with a>; ~ 0 (j =: 1, 2, ---, n) 
where 2, 2, *-*, Zn are nm independently (but not necessarily identically) 


distributed proper random variables each having a finite variance, then the 
conditional distribution of Y for fixed X is always L.R.H. (8, 75) whenever the 
relation b;/a; = bo/a, = --- = b,/a, = B is satisfied. 

In the following we shall establish the normality of the random variables 
x,;’s from the property of the linearity of regression and homoscedasticity of the 
conditional] distribution of Y for fixed X as introduced in Theorem 3.1, under 
some conditions. For this purpose let o; denote the variance of the random 
variable x; (j = 1, 2, --- , n). Then the coefficient of regression of Y on X will 
be given by 6 = ze a,b ,8,0;/>, ajo; , the summation extending over all the 
indices j for which (6,8;)/a; + 8. We now state the following theorem. 

THEOREM 3.2. With the same notations and assumptions as those used in theorem 
3.1, the necessary and sufficient condition for the conditional distribution of Y for 
fixed X to be L.R.H. (8, 06) is that 

(i) each x; for which (b,;8;)/a; # B is normally distributed, while each y; and 

the other x;’s have arbitrary distributions. 


(ii) 8 = 2 a;b ;8 0; > ®, ajo; 
and 
_ 2 > , b 8 9 2 
os = > bron + > ® ( a a) a;o;, 
p=l 


a; j 


where >~ 


2. stands for the summation over all the indices j for which (b;8;)/a; ¥ 8. 





134 R. G. LAHA 


PRoor. 

Necessity: First of all, substituting §; = a,x; and »; = b,y; as in Theorem 3.1, 
for j = 1, 2, --- ,n, we get X = 2 ee §; and Y = > 3 1; - 

Since it is given that the conditional distribution of Y for fixed X is L.R.H. 
(8, 03), we get on using the conditions (2.1) and (2.2) of Lemma 2.1, 
A(x, "| _,% d&(u, 0) 


(3.10) — 


Ov du 


3.11) FO(u, "| »€a(u, 0) 


. = —g,%(u,0) + 8 
Ov? } du? 

Next using the relations (3.1), (3.4), (3.5), (3.6), and (3.7) together in the 
equations (3.10) and (3.11), we have 


(3.12) >, a, weilu, = IT g(u, 0) = 6| 3 9) II elu, 0) |, 


as] du =] du 


n 


-Il¢ »(u,0) >" 2 + ae " OTT (u, 0) 


j=l j=l u* 


" . 3’ PY deg ;(u, oa Q) II ¢ 1.0) = IL ( 
) Mk (1 - 


“_ 
jxk du lei 


A eee IT adm 0) + 7 dg (u, O) dg, (u, 0) I] ot, 


aa div Fo du du l= 
Now noting that each of the characteristic functions is continuous for all real 
u and equal to unity at the origin, we restrict the values of u to a suitably chosen 
neighborhood ju) < 6 (6 > 0) of the origin, such that each of the factors oc- 
curring in the product [ [1 ¢;(u, 0) is different from zero, and then divide 
both sides of (3.12) and (3.13) by Li 1¢,;(u, 0), thus obtaining 


1 ( 0 =~ 1 (u. 0) ‘ 
3.14) > p, wie, | esl, 0) = p>, vi" | eu, 0), 


= du i du 


n 


~ 49 ry do(u, 0) ’ 
— Doh + Yat Pee / o(u,0) 
=1 


1 du? 


: ™ 2 }de,(u, )) ’ ‘dg ‘AU, 
- 2 Bi Be: ¢j\u, 0)? ¢ 
he 


foth du 


d’o;(u, 0) 
$j = J eu, 0) 
du? 
~ {do,(u, ( 5 ‘de, (u, 0) / . 
. ye: u » / eu,0 dex (u, / #x(u,0)} 
jpsh du ; du 


Then using the transformation ¥;(u) = In ¢j(u, 0) for j 1,2, ---,n and 
expressing (3.14) and (3.15) in terms of the derivatives of y;(u), we have 





CHARACTERIZATION OF NORMAL DISTRIBUTION 


, ~ AE dy; 
> Bi de > du ’ 


j=l 


oe he > 8; OV (3 8; a) 


du joni du 


“dy; “ dy;\’ 
ao BY (E4)] 
jai du = du 
Using (3.16), (3.17) further simplifies to 


3.18 a ot + > git Vs _ 4 ge EW 


jent jaunt due a1 au 


Again differentiating both sides of (3.16) once more with respect to u, we get 


~ dy; “dy; 
(3.19) Soe ashe 
jan8 dw. jai du? 


Then using (3.18) and (3.19) together, we have 


oe = Lacy a 28 2, 8; oY + 3 > Se 
au pax] ( P 


du? jai du? 


$- Dos) =¢ 


2 dy; 3? > dy; 


j=l 
Finally integrating Eq. (3.20) with respect to u, we get 


ok a" 
(3.21) I] fo,(u, O)} 3 


which holds for all u in the interval u| < 6 (6 > 0) where Q(u) is a quadratic 
polynomial in wu. 

Then using the theorem of Linnik (Lemma 2.3), it follows at once from (3.21) 
that each ¢; , for which 8; ~ 8, is normally distributed. Thus each x; , for which 
(b,8;)/a; ~ 8, is normally distributed. 

Sufficiency: Without any loss of generality, we assume that for the first r 
pairs (r S n) of the random variables (z;, y;), the relation (6,8;)/a; + 8 is 
satisfied so that for the remaining n — r pairs, b,8;/a; = 8. Then from the 
conditions given in Theorem 3.2, the distribution of each <; is normal for 7 = 
1, 2, ---, r and arbitrary for7 = r+ 1, ---, n. That is, putting &; = a,; 
and n; = b,y;, we get the distribution of £; as normal for 7 = 1, 2, --- , r and 
arbitrary for 7 = r + 1, r + 2, ---,n. We are also given that 8; ~ 8 
(j = 1,2, ---,7r), while B,. = Brag = *°* = 8. = p. 

Let us write X¥ = X>_ + --> + & and Y = Yot mat--- +m, 
where Xo = > jai £; and y, 





136 R. G. LAHA 


Let o(u, v) and &o(u, 0) denote the characteristic functions of the distribution 
of (Xo, Yo) and the marginal distribution of Xo respectively and further 0,” 
the variance of §; for j = 1, 2,---,n. 

Then it can be shown, proceeding exactly in the same way as in (3.4) through 
(3.7) above, that 


3.29) _— v eo] / ou, a > go” 
j=) 
FHp(u, v git" Sg! gi?) 
(323) — 2) sei u,0) = Do; o7 — ; Bio; +u (> B; o'*) . 
r= j=l J 


ov- 1 


dd( u, Q) 
du 


ad y\ 
(3.25) a, © / an(u, 0) 


(3.24) @(u,0) = —u do? 
j=l 


du? 


where 


$,(u, 0) = exp 4 —}u 


Then using (3.22) and (3.24) together, we get 


(3.26) one) | = g elu, 0) 


Ov du ; 
where § is given by 
-2 joi / do; 
j=l 


Again eliminating u° from both the equations (3.23) and (3.25) and using the 
value of 8 as obtained in (3.26), we have 


ve (1 y ~ 49 = 9 of 
mile] / lu, 0) a _ oj0 — 2. Bj a; 
— j=1 j=l 


ov 
+6 | As o /e (u, 0) + Zz oi] ’ 
du? stp 


which after a little simplification reduces to 


(3.27) 


d(u, 0) 
due 


ov- 


(3.28) do(u, | = —g, o(u,0) + 8’ 


where 


Ys 750 D+ > (8; = B)*o;” 


j=l 





CHARACTERIZATION OF NORMAL DISTRIBUTION 137 


Then using Lemma 2.1 to (3.26) and (3.28) it follows easily that the con- 
ditional distribution of Yo for fixed Xy is L.R.H. (8, 05”), where 8 and a,” are al- 
ready defined in (3.26) and ais respectively. 


Again since 8.,; = Brag ee = Bn = £, it follows by applying Theorem 
3.1 that the conditional distribution of Y for fixed X is L.R.H. (8,03) where 6 
is the same as that defined in (3.26) and 


os is Tal oO; + 2d (8; — B)’s;” 
a 


j=l 


Hence the proof of the theorem. 


4. Some Corollaries. We shall now deduce some important corollaries in 
this section. 


Coro.uary 4.1. Let there exist two linear functions 
X= Dohiawz; and Y= Dpibs; with apd; ~0 (j = 1,2, ---,n) 


where x1, %2, -** , Zn are n independently (but not necessarily identically) dis- 
tributed proper random variables each having a finite variance oj and zero expecta- 
tion. Then the necessary and sufficient condition for the conditional distribution of 
Y for fixed X to be L.R.H. (8, 06) is that 
(i) each x; for which b;/a; # B is normally distributed, while the remaining 
r,’s have arbitrary distributions 


2 22 
5 = P By a,b; by 459; 


2 0 (; ,\ 2 2 
o= >, - = > A;7j, 


a; 


the summation extending over all the indices j such that b;/a; # 8. 

Corouuary 4.2. Let (x; , y;) 7 = 1,2, --- , n be n independently (but not neces- 
sarily identically) distributed two-dimensional proper random variables each having 
a finite variance and zero expectation, such that the conditional distribution of y; 
for fixed x; is L.R.H. (8; , 030) for 7 = 1, 2, --- , n. If there exist two linear func- 
tions X = >>}, a,z; and Y = op by; with a; ~ 0 (j = 1, 2, --- , n) such 
that they are stochastically independent, then each x; for which 8; = 0 is normally 
distributed, while the remaining x;’s and each y; have arbitrary distributions. 

Coro.uary 4.3. Let (x; , y;)j = 1, 2, be two independently (but not necessarily 
identically) distributed two-dimensional proper random variables each having a 
finite variance such that the conditional distribution of y; for fixed x; is L.R.H. 
(8;, 030), = 1, 2. If there exist two linear functions X = ax, + ax, and Y = 
bir + boys with asd; ~ 0 (7 = 1, 2), then the necessary and sufficient condition for 
the conditional distribution of Y for fixed X to be L.R.H. (8, a) is that each x; 
is normal whenever b3;/a; ~ beBo/ade . 





138 R. G. LAHA 


PROOF. 

Necessity: First of all we substitute é; a,x; and n; by; (j = 1, 2) and 
then proceed in exactly the same way as in Theorem 3.2. Then the equa- 
tion (3.12) reduces to 
, dg, (u, 0) 


(4.1) (8; — B) ; 8. i(u, 0) 
du , du 


Now we shall show that under the conditions 8; ~ 8», neither 6; 6 nor 
/ . fe / . , 

82 — 6 can be equal to zero. For if 6; — 6 = O, while 6 8  O, the equa- 
tion (4.1) becomes 


dgo(u, 0) 
du 


(4.2) gi(u, 0) = () 


Thus in a suitably chosen neighbourhood |u| < 6 (6 > 0), of the origin where 
gi(u, 0) + 0, we have 


dgo(u, 0) it 
du 


thus leading to the conclusion that the distribution of £ is improper, the whole 
mass being concentrated at the origin. Similarly if 6, — 8 = 0, while 8; — 6 + 0, 
‘it can be shown in an exactly similar manner that the distribution of & is im- 
proper. Hence the only alternative left is when both 6; — 8 = 0 and B; — 8 = 0 
simultaneously, but in this case we have 6; = 82, which is contrary to the 


conditions of the theorem. The rest of the proof is as in Theorem 3.2. 


REFERENCES 


. Basu, ‘‘On the independence of linear functions of independent chance variables,’’ 


Bull. Inst. Internat. Stat., Vol. 33 (1953), pp. 83-96. 
Cramer, ‘‘Ueber eine Eigenschaft der normalen Verteilungsfunktion,’’ Math 
Zeit., Vol. 41 (1936), pp. 405-414. 
. Darmors, “Sur une proprieté caractéristique de la loi de probabilité de Laplace, 
C. R. Acad. Sci. Paris. Vol. 232 (1951), pp. 1999-2000. 
7. Darmors, ‘‘Analyse générale des liaisons stochastiques etude particuliere de |’ana- 
lyse factorielle lineaire,’’ Rev. Inst. Internat. Stat., Vol. 21 (1953), pp. 2-8 
5) B. V. GNepENKo, “On a theorem of S. N. Bernstein,’’ Jzvestiya. Akad. Nauk. SSSR. 
Ser. Mat., Vol. 12 (1948), pp. 97-100 
Bb] M. Kac, “On a characterization of the normal distribution,’’ Amer. J. Math., Vol. 
61 (1939), pp. 726-728. 
7) R.G. Lana, “On characterizations of probability distributions and statistices,’’ Ph.D. 


” 


thesis submitted to Calcutta University, 1955. 
[8] R. G. Lana, “On a characterization of the stable law with finite expectation,’’ Ann 
Math. Stat., Vol. 27 (1956), pp. 187-195. 
{9} Yu. V. Linnrk, ‘‘On an analytical extension of Cramér’s theorem on the normal law, 
Reports of the Leningrad and Moscow University probability seminars, 1954. 
{10} Yu. V. Linnrx, ‘Independent and equally distributed statisties,’’ seminar notes, 
Indian Statistical Institute, Calcutta, 1954. 





CHARACTERIZATION OF NORMAL DISTRIBUTION 139 


{11] E. Luxacs ano E. P. Kine, “‘A property of the normal distribution,’’ Ann. Math. 
Stat., Vol. 25 (1954), pp. 389-394. 

{12} C. R. Rao, M. A. thesis submitted to Calcutta University, 1943. 

(13) C. Roruscutip anp E. Mourter, ‘‘Sur les lois de probabilité & regression linéaire et 
écart type lié constant,’’ C. R. Acad. Sci. Paris., Vol. 225 (1947), pp. 1117-1119. 

[14] V. P. Sxrrovicn, ‘On a property of the normal distribution,’”’ Doklady Akad. Nauk. 
SSSR (N.S.), Vol. 89 (1953), pp. 217-219. 





ON THE ESTIMATION OF AUTOCORRELATION IN TIME 
SERIES 


By Z. A. Lomnicki AND 8S. K. ZAREMBA 


Boulton Paul Aircraft, Lid., Wolverhampton (England) 


1. Introduction. In a recent paper, F. H. C. Marriott and J. A. Pope [8] in- 
vestigated, in some special cases, the bias arising in the estimation of the auto- 
correlation function of a discrete-parameter stochastic process when its mean 
is not known. M. G. Kendall [4] developed a general method for the determina- 
tion of this bias in the case of an arbitrary Gaussian process. 

The removal of the mean from a stochastic process may be regarded as a 
particular case of the elimination of a polynomial trend. The object of the 
present paper is to determine how the removal of this trend affects both the 
biases and the covariances of the estimators of the covariances and of the auto- 
correlation coefficients; it is not assumed that the process is necessarily Gaus- 
sian. 

In the two papers mentioned above, the passage from the estimation of the 
covariances to that of the correlation coefficients was achieved by what may be 
called the method of statistical differentials. The estimator /,.»~ of p, was regarded 
as a function of certain covariance estimators and, in the derivation of relevant 
formulae, the difference f.,.~ — px was replaced by the first differential of this 
function. The general validity of this kind of argument needs to be clarified, 
as remarked by Kendall himself in the last paragraph of his note. The same 
applies to the derivation of cov (fx,», 6:.~) by Bartlett [1] in the case in which 
there is no trend. In order to make rigorous this kind of argument, we prove a 
general theorem conceived in the same spirit as a proposition given by Cramér 
([3], pp. 353-356) for functions of sample moments, and justifying the use of 
the method of ‘‘statistical differentials” under specified assumptions. 


2. Basic definitions and assumptions. 

AssuMPTION 1. In what follows it will be assumed that {y,} is a discrete- 
parameter stochastic process composed of a determinate polynomial trend 
fm(t) of degree at most m, and of a linear stochastic process {z;}: 


where {z,' is of the form of 


(2.2) = ‘yi a 


the series 7 a h, being absolutely convergent, and {e«,}| being a wide-sense 
stationary process with zero means: 


(2.3) E(e) = 0, E(¢) =o E(ex€.) 0 fori + s. 


’ 


teceived January 3, 1956. 





AUTOCORRELATION IN TIME SERIES 141 


In dealing with estimation problems, some further assumptions on {e;} are 
needed, and it is customary to assume that the random variables e; are inde- 
pendent and have identical distributions. This assumption can be weakened by 
introducing the following definition: 

DerrtitTion 2. A stochastic process {e,} will be described as stationary up to 
the order p, with the corresponding moments behaving as if the «’s were inde- 
pendent, if all the moments up to the order p exist, if, for any integer 7 and for 
any set of integers t,,h,---,t, (8 S p), 


E (€1,47€t,41 ** @ 29) E(€1,€t, 
and if, for any set of pairwise different integers 4 , 2, --- , t, and, for any set 
of positive integers \; , Ax, --+ , A, such that A; + A.» + --- +A, S p, we have 
af Ay A ay h a Ae . \ 
Eee «+: 68) = Ele} )E(e2) --- E(e:*). 
AssuMPTION 3. We assume that the process {e,} is stationary up to the eighth 
order, and that the corresponding moments behave as if the ¢’s were in- 


dependent. 
It follows from (2.2) that 


«a 
R, = cov (%, 24%) = o° > he het ; 
s=) 
moreover, in view of the absolute convergence of >~h,, the series }>**.. R, 
is absolutely convergent. The random variable 


" 1 ~ 
(2.4) Ri.w = N >» Tt Lirk 


4¥ tel 


is obviously an unbiased estimator of R;,. It is also consistent; indeed, if we 
make 


(2.5) h, = 0 fort < 0, 
it follows easily from (2.2) that 


+0 
cov (22 Lee » Lp Lert) = Ks 7 hit Rithne Teo Methe. r 


T=——o 


+ Reyns — Rig Re 9 


and, therefore, 


+x“ 


(27) lim N cov (Ry.w;Riy) = RR: + > (ReRea-t + ReeRe_d = ri- 


Noo = 


(Refer to [1]), and, in particular, 


(2.8) lim N var Ri.w = ~ Ri + D> (Ri + RereRe+) = Yen, 
o 


N+x q=— © 


7 4 
where x, = E(e) 





142 Z. A. LOMNICKI AND S. K. ZAREMBA 


There is no difficulty, either, in showing that (2.7) and (2.8) retain their 
validity for the alternative estimators of R,: 


N—k 
Zz Liteye = Rew, 


l 
‘ = oo 
1048 N —k t=1 


which exhaust the information supplied by the sample of size N. Thus 


2.9) lim N cov (Cy,~;C 


N+2x 
and, in particular, 


(2.10) lim N var Cy.w = Vie. 
N+x 
The validity of (2.10) is obvious, while (2.9) follows easily from (2.7), if we 
note that 


var (Ci.w — Ri.w) = O(1/N’), 


so that, in view of the Schwarz Inequality, of (2.8) and (2.10), 
N{cov (Ci.v; Cin) — cov (Rew ; R..»)] = (0 ( sr .), 


The following proposition, concerning the asymptotic behaviour of the fourth 
moment of Rz,v , will be required in Section 6: 

Proposition 4. Under Assumptions 1 and 3, 

(2.11) lim N*E(Ri.w — Ri)‘ = 3vix, 
N+xn 
where v,, is given by (2.8). 

Since the fourth moment of R,.v involves the eighth moments of x, and « , 
the proof is essentially based on Assumption 3. The argument leading to (2.11) 
is completely elementary, but a straightforward proof is fairly laborious; a more 
general proposition concerning all the moments (both univariate and mixed) of 
the covariance estimators has been proved by the authors of the present paper, 
and it is hoped that this result will soon be published. 


3. The bias of the covariance estimators. For any fixed k smaller 
than N — m, write v = N — k. Let @o(t), dilt), --- , om(t) be the first m + 1 
Chebyshev polynomials orthogonal on the set ¢ = 1, 2, --- , v. Of course, these 
polynomials depend on v; however, in order to simplify the notations, we drop 
the subscript »v both here and in subsequent formulae, with the exception of 
those cases in which this omission could cause ambiguity. We have (see, for 
example, [5], pp. 159-161) 


a ds OCH See’ i Ss ~ 5 
j=0 (22) '(7!)2(t — 7) '(v — 7 — 1)! 





AUTOCORRELATION IN TIME SERIES 


if i # j, 


> (t,t) = § Gy! 
t=1 ¢ om ) 


2 2 aire ioe 8 
Bla +! -vyy —1)---W—7), ifi =j. 


Expanding f,,(t) into a sum of these polynomials, we find 
(3.3) fm(t) = Agho(t) + Ardi(t) + --- + Andnlt), 
where 

. fin (t) b;(t) 
“ay 
The least-square estimators A; of A; are clearly 


eS. pee Ai + a;, 
Q; t=] 


Similarly, 


= l — k) == Body (t) Bid (t) —- +90 = Bid, (t), 


where 


( 
- ‘ou fm(t + k)d;(t), 


and the least-square estimator B; of B; are 
(3.6) B; = B; + by, 


where 


(3.7) — x (t 
Oud ~@ eo t+k Pill). 

It should be noted that, while they can be expressed by means of the 
coefficients Ag, A;, ---, Am, the coefficients By, Bi, ---, Bm form a dif- 
ferent set. Evidently, 


> Aolt +b) = Dd Bibi) = falt +), 

{anf i=( 

but the use of the estimators {A,} and {B;} leads to two different estimators 
of the polynomial f,,(t), one based on the sample values y; , ye, --* , Yw-e, and 
the other on yrs: , Yes2, *** , Yn; this is in keeping with the method applied 
by Kendall [4]. 





144 Z. A. LOMNICKI AND S. K. ZAREMBA 


It is proposed to investigate, as an estimator of the covariance of the {2;} 
process, the expression 


Cin = - m- > Aveo | Ee - > B.eitd | ’ 


i=() i=( 


which, in view of (2.1), (3.4), and (3.6), is equal to 


(3.8) Cry = } > |= => > aout | EE -> boutd | : 
i=) 


V t=l1 i=Q 


Lemma 5. Under the assumptions of Section 2 


¢ 9) Cry = Ci.w ate : , a:b: . 


; V iad 
Proor. According to (3.8), 


a > Zse 2 aidi(l) _ LS «, > bi ¢:(0) 


Vi t=} 


tei i=( 


(3.10) 
l< ~ 
+- > Dd adbjoi(to;(0). 


t=1 i,j=0 


In view of (3.2), the last sum in (3.10) is equal to 


>, a;b:Q; , 


] 
V i=xO 


while, owing to (3.5) and (3.7), the remaining two sums also have this value. 
This completes the proof of the lemma. 


Lemma 6. With i being any fixed non-negative integer, and ¢,(t) being given by 
(3.1), of p> Cc, 1s convergent, and if 


l ¥ 
(3.11) Ye = > De bilOdi(s)e , 


t t,s=1 
then 


+0 
lim y = > Ce. 


v2 t=—~ 


Proor. Make 


doi = Li oi(s)oi(s + a) ( ,1,--+ ,»— ID; de? =0, 


a=] ei 
noting that dj’? = Q;. Substituting a for ¢ — s in (3.11), we find 


l v1 
(3.13) Yr = Fo) Es + dca + c-0| 
a=l1 


0,4 


Introducing the notation 





AUTOCORRELATION IN TIME SERIES 145 
we can apply Abel’s Transformation to the right-hand side of (3.13), obtaining 


1 v—l ; 
¢ \ - i i. (¥) (vy) ra 
(3.14) ‘= dy” > (da; ~— a+1,i)5e . 
0,4 a=) 
This formula shows that the sequence {y,} can be obtained from the sequence 
{S.} by a linear transformation, the coefficients of which are equal to 
1 


(3.15) TO) 
9,i 


[dot — deta. 

According to a well-known theorem of Toeplitz (see, for example, [6]), the 
transformed sequence {y,} is convergent to the same limit as {S,}, if the fol- 
lowing three conditions are satisfied by the coefficients of the transformation: 

(i) if lim,.. , 5,,2 = 0, for any fixed a, 
(ii) if there exists a constant K such that, for every », 
v—1 


Zita, | Kk, 
a=) 


(iii) if limy.. 02-0 5.4 = 1. 
In order to prove (i) and (ii), note that 
v—a-—l 


(3.16) de — dic = — DL o(s)Agis + a) + oi(v — aoi(v). 
s=1 


According to (3.1), 


@;(s) 258 (s 1)“ and A¢d,(s) = > ff i(s — 1)”, 
j=0 


j=0 
where 
a ns DE + DM —F — D! 
fis = cae 
(21) (g))*(4 — 7)» — 2 — 1)! 
Thus, clearly, 
(3.17) \ft7 | < As’, 
where 
A = max et eSB 
Osisi (22) 1(7!)*(a — gl 
Hence 


do) |= KfPo—-)” | Ss AD = AC + De, 
j=0 


j=0 


and, a fortiori, |gi(v — a)| S A(i + 1)»’, so that the second term in the right- 
hand side of (3.16) is smaller than A*(i + 1)*»**. On the other hand, owing to 
(3.17), 


v—-a-—l 


= i 
| ~ ¢;(s)Ad,(s + a) } SUE a bo 1) (8 ie 1)» | 
s=1 j,l—0 on 


i v—a—l 


< A’ - yer > s(s + a)'" s A*(i + 1)*r". 


j,l—_0 o=1 





146 Z. A. LOMNICKI AND S. K. ZAREMBA 


Thus there exists a constant A» such that 
ae — ( easel < Aw" 
but according to (3.2), 
a” hs (2!) 
(22) !(22 + 1)! 


and, in view of (3.15), this shows that lim,... 6,. = 0, and that there exists a 
constant K such that pene! Real ae a. 

Thus Conditions (i) and (ii) of the Toeplitz Theorem are satisfied; so is Con- 
dition (iii), since Pees 5,2 = 1 identically, as can be seen by summing (3.15). 
Therefore, 


vy — 1) 


3.18) lim y, = lim S, = 
yD vox 
Hence the proof of Lemma 6 is complete. 
Now it can be shown that Cfy is an asymptotically unbiased estimator of 
R, and that, to order N~', the bias is equal to —(m + 1) )-t2... R,/N. More 
precisely we have: 


q= 


Proposition 7. Under the assumptions of Section 2 


(3.19) lim NE(Ciy — R) = —(m+1) > R,. 
N+ g x 


Proor. According to (3.9), 
vE(Cin —~R,) = - a Q;E(a;b,); 
but, owing to (3.5) and (3.7) 


(3.20) QE (a:b) = + DL dildos) Res. 
i t,e=1 


c 


Applying Lemma 6, we find 


in OF) - + R= FR. 


a= 


which completes the proof of (3.19). 

It will be noticed that, according to (3.19), the bias of the covariance esti- 
mators based on (3.8) is negative. In the particular case when m = 0, (3.19) 
shows that this bias is asymptotically equal to the negative of the variance of 
the mean (see, for example, Lemma 2 in [7]}), and this can be seen at once on the 
basis of elementary considerations. The fact that in the general case the bias is 
asymptotically proportional to m + 1 is due to the superposition of the effects of 
fitting the successive orthogonal polynomials, each of which contributes the same 
generalized mean-square error. (See Lemma 6.) 





AUTOCORRELATION IN TIME SERIES 


4. The covariance of the covariance estimators. 

Since the elimination of a polynomial trend induces, in the covariance esti- 
mators, a bias which is merely of the order of 1/N, one is not surprised to find 
that this procedure, asymptotically, does not affect the second and the fourth 
moments. The proof of the relevant propositions is based on the following: 

Lema 8. Jf i is any fixed nonnegative integer and ¢;(t) is given by (3.1), if 
> 32... c: is absolutely convergent, and if 


(4.1) 8, = o > :(s)@:(O)| | ers |, 
i t,a=l 


then there exists a constant K independent of v such that 8, < K. 


Proor. By making t — s = a@ and using the same argument which led to 
(3.13), we obtain from (4.1) 


v 1 ¥—a 
(42) 6s a le Col + > (lca! + | ca!) Dd | o:(8)6:(8 + a) |: 
i a=! a=l1 


but, clearly, 


> | o:(s)6(s + a)| = OW"). 
s=) 
Hence, in view of the second line of (3.2), Qj" Disa lpi(s)@(s + a)| is bounded 
by a constant independent of »; this, owing to the convergence of 5 >72_.. |c. 
completes the proof. 
Proposition 9. Under the assumptio: ; of Section 2 
(4.3) lim N cov (Cin; Ciw) = %,1, 


N+x 
where v,,; is given by (2.7). 
Proor. For any fixed k and / smaller than NV m, write y = N — 
V — land 


Xi , a; b:Q; ; X; ® X ix 
y 


t=O 


(4.4) 


m 


Xen Se: . Xe 2 Xn 
v 


\ i=0 

where Q; , a; , and b; are given by formulae obtained from (3.2), (3.5), and (3.7) 
when the sequence of polynomials ¢,(t) orthogonal on the set ¢ = 1, 2, ---, » 
is replaced by a similar sequence of polynomials ¥,(f) orthogonal on the set 
t= 1,2,---,»’. 

According to (3.9), 
(4.4’) Cry = Cin - X; ; Cis = Cin — X; : 
hence 
\ COV (Cihw ; Ciw) = cov (Cz,w ; Crw) — cov (Xz ; Ci.) 
— cov (X1; Cx.v) + cov (Xi ; X:). 


(4.5 





148 Z. A. LOMNICKI AND 8S. K. ZAREMBA 


But, substituting (3.5) and (3.7) in (4.4), we find 


ai 1 > i (gi(8) 22 Le4s 
V tie=ml Q; , 
and, in consequence, 


1 1 : 
var Xx% = % Q: Dd di(t)oi(s)i(p)Oi(g) Cov (xe %s4u 5 ZpTer4); 
4 t,8,p,q=1 


or, in view of (2.6), 
1 ¥ 
y 


de oO o:(8) bsp) OQ) 
&,P.g 


L 
+ 
ce Ring + RicaRou + i) 2, abe vhpshet of 


var Xu. = 
(4.6) 


r=— oo 


This fourfold sum yields three terms, the first and the second of which are 


respectively equal to 
1 
(t)d:(p)Ri_p 
tle z ? = | 


( t (p)R, bine 
414 a di (Od; q) R, —q— | [2. x, #6 8)Q9i\J )E —p+ | 


Lemma 6 shows that each of these expressions in brackets tends to the finite 
limit }“**.. R,, when v — «; hence this part of var Xj is O(1/r’). 
The third sum is obviously dominated by 


and 


(4.7) (3 >| eledd | L-ticKa. ) 


G, om =. 


and the fact that this expression is also O(1/»’) can be proved as follows: 
If 


+2 +x 


(4.8) a) ae, Te ee, oe 


r=m=— 00 r=— 00 


the expression in brackets in (4.7) becomes 


(4.9) a | :(t)o:(8) | Rees - 


t,e=1 


Obviously, the infinite sum ee. ®R.-% is convergent, while the boundedness 
of (4.9) follows from Lemma 8. Thus 


(4.10) var Xx = O (5) ‘ 
3 


From the definition of X, and from the Triangle Inequality, it follows that 
var X, = O(1/»’) and similarly var X; = O(1/v”); by applving the Schwarz 
Inequality to cov (X; ; Ci,~), cov (X71; Cz.w), and cov (X; , X,) in (4.5), it can 
be seen that these covariances are respectively O(N~*”), O(N~*”), O(N), which, 
in conjunction with (2.9), completes the proof. 





AUTOCORRELATION IN TIME SERIES 


Coro.uary 10. Under the assumptions of Section 2, 
(4.11) lim N var Ciw = vn. 
N+ow 
Lemma 11. Under the assumptions of Section 2, 
(4.12) E(Xi) = O(1/N’). 
Proor. Clearly 


( 4 


, } 
E(Xi,) = <3 O! ._ E¢ p> oi (t) os (8) Tape 


(4.13 11 - 
= p> i (ts) bi (te) bi (te) bi (ta) Gi (81) Gi (82) Hi (8s) bs (84) 


4 
v Qt t,t 4,81 ,82.83,84=1 
+ E(u 40 E tLe gp hL tg Veg 4 hE De gtk): 
Owing to (2.5), Eq. (2.2) can be written as follows: 


+00 
pe hie €¢ > 


qo 


so that 


E(x; 2j 212m Xn Lp Tr Ts) 
(4.14) $2 


> Dasa, Daguies Ibsen Mer~ag thnen Memes Mewes Mowe Me Gacten °°” See) 
@1+92s"** Qa 


But, in view of our assumptions concerning the moments of the ¢’s, only those 
moments do not vanish which correspond to equal indices in sets obtained by 
the following types of pee of the eight indices q , g2, --- , gs : (8), (2, 6), 
(3, 5), (4, 4), (2, 2, 4), (2, 3, 3), (2, 2, 2, 2). 


_— the right-hand side of (4.14) becomes: 


| us > he.ghyghé-ghin—¢haighpughi-ghime 


= 


+0 
' 2 ’ 2 
T Mo Dae X Ns_q,j—q,hrqqhm—aqhn—a2tp—aqhr—aghte—<a, 
@1.49>—-@ 


+2 


/ 
M3 M5 Diss) > Ria, hj—qy Rtg, hme Rn—o2 hp—ao bras hs—as 


192-2 


ir Daw ™ hia, h;- e, Ai @; hm @: An—e2 hy a 


q142=—-2 
+2 


4 ' 
Mag Dao) 2 hig, hy- a1 Rigg Mm—as Mn—as hp—ay Mr—as Mees 


71/92,93=—-2 


x 


+ 
22 , 
M3 o Des a hig, hj, hie, him—qo Mn—os Np—as ha, Nees 


@1-92,943=—-@ 


2 


8 / 
o Ds) 2. Ris—e, Age, Ri—e Rn—e Mn—es Mp—as Pr—ey Peers 


21-9293-04=—-@ 





150 Z. A. LOMNICKI AND S. K. ZAREMBA 


where > an is extended to all the 28 partitions of the type (2, 6), >) to all 
the 56 partitions of the type (3, 5), and so on, and the prime after the sign of 
multiple sum denotes the exclusion of all the terms in which not all the summa- 
tion indices are pairwise distinct. 

Each of the multiple sums taken with respect to qi , g2 , etc. is dominated by 
an expression of the type of 


Ry-~j;Ri-mRn-pRr—s ; 


where &; is defined by (4.8). The only exceptions are the sums occurring in the 
third and the sixth lines of (4.15), which are dominated by products of the type 
of 

Ri-Ran-aRp_-H, 
where 5C = >04<o| h, 


When (4.15) is substituted for E(x tt m%nXpt,-2,), in (4.13), the contribution 
of the first sum of (4.15) is dominated by 


Ms = +l es di (f)di(s) | Res «| 


Q; tem 


But, in view of Lemma 8, the expression in brackets is bounded by a constant 
independent of v, so that this contribution is O(1/r*). In a similar way it can 
be shown that all the other sums in (4.15) contribute O(1/»*) when substituted 
in (4.13), the only exceptions being the sums appearing in the third and the 
sixth lines. But the contributions of these sums are dominated by expressions 
of the type of 


- 3 
4 E 2D | p:(t)o:(s) | wu+s| E > | bi(s)oi() | a 


t,e=l 


multiplied by usus and uo respectively; here the first factor in brackets is 
bounded, while the second is 


1 SF 06" = 06), 
Qi tiem 
so that the contributions of these sums are O(1/y*). 

Thus the total contribution of (4.15) and (4.13) is a sum of a fixed number of 
contributions, each of which is either O(1/»*) or O(1/»*). Hence the proof is 
complete. 

Coro.uary 12. Under the assumptions of Section 2 


(4.16) E(X, — E(X,)\* = O(1/N’). 
Proor. According to Lemma 11 and to the Minkowski Inequality, E(Xi) = 


O(1/N*), and (4.16) follows at once. 
Proposition 13. Under the assumptions of Section 2 


—9 


lim N°E[Cty — E(Cky)l* = 30% . 
N+2x 





\UTOCORRELATION IN TIME SERIES 


Proor. According to (4.4’), 
Chu ~ B(Cox) = (Con ~ Be) — (Xs — B(XD) 
and, therefore, 
N°E[Ciw — E(Ciw)\* = N°E(Ciw — Ry)* — 4N°E{ (Ce.w — Ri) (Xi — E(X;))} 
+ 6N°E{ (Ci.~ — Rx) (Xz — E(X2)F} — 4N°E{ (Cr.w — Ri) [Xe — E(X,))"} 
+ NE[X, — E(X,)I". 


The first term in the right-hand side tends to 3vj according to Proposition 4, 
and the last is O(1/N) owing to Corollary 12; by repeated applications of the 
Schwarz Inequality we deduce from (2.10) and (4.16) that the remaining terms 
are respectively O(N~"*), O(N~), O(N~**), and this completes the proof. 


5. The method of “statistical differentials.” 

THEOREM 14. Let H(Y,, Yo, ---, Yp) and G(Yi, Y2, --- , Yp) be any two 
functions vanishing at the point (0,0, --- , 0) and having continuous partial deriva- 
tives of the first and second orders in the neighbourhood of this point. Let yn, He 

-, ye” be any random variables with 

lim NE(y}”) = 


N+2 


lim N cov (y{”, y”) = (i,j = 1,2,---p 


N+o 


c; and c;; being constants. Moreover, assume 


(5.3) Ely’ — E(y})} = O(1/N’) (¢ = 1,2,---, p). 


(N) (N) 


Then, if H(y\"”, y$, --- , yo”) and G(y{””, yi”, ---, ys”) are bounded uni- 
formly with respect to N, 


, vp 
(5.4) lim NE[H(y?”, y&”, --- 8 ‘ei 9 > oH Ci; 


’ 


im OY; Sx Oy; Oy; 


and 


P 
iat ‘ , ’) r N) 0H aG 
(5.5) lim N cov [H(y;"’, Yn yes vy’ Pp >)3 G(y\” ys”, wees ys ) — 


N+» oo i,j=l OY; OY; 


J» 


the partial derivatives being taken at the point (0, 0, --- , 0). (Obviously, if we 
make H = G, (5.5) yields the corresponding formula for var H). 
Proor. In the first place we note that, if 6 is any positive number, 


(5.6) Pllyi?” — E(y?)| = 8) = O(1/N’), 
which follows from (5.3) and from the generalized Chebyshev Inequality: 


(5.7) Ely? — BGyY?y = Pty?’ — E(y’”)| = 4). 





152 Z. A. LOMNICKI AND 8S. K. ZAREMBA 


On the other hand, it follows from (5.1) and (5.2) that 


(5.8) lim NE(y{’y$”) = lim N[cov (y{”, yf’) + EG) E(y}”)] = ¢ 


Wwe 
N+x N+ow 


Since, according to the Schwarz Inequality, to (5.2), and to (5.3), 

Ely” — EGYS)?| s (Ey — By??? (war yi?” = O(N”), 
(5.3) entails 
(5.9) E(y{"*) = O(1/N’). 


If « denotes an arbitrary positive number, make 


n= 2e/{1+/% aT}, 


and let the positive number 6 be sufficiently small to ensure that, whenever 


the second-order partial derivatives of H and G are continuous and differ from 
their respective values at the origin by less than 7. Finally, let & be the event 


(N)) 


yi] <3 (i = 1,2, +++, p), 


and & the complementary event. Clearly, both events depend on N and 4, but 
the abbreviated notation & for &y,,; and & for &y.; should not lead to misunder- 
standings. Furthermore, 


P(é) < > Pl yf” | = 4, 


i=] 
which, according to (5.9), entails 
(5.10) P(€) = O(1/N’). 


(N) (NY 


Writing y“” for the vector (yi, ys”, ---, yO”), and denoting by P the 
corresponding probability function, we have 


(5.11) E{H(y™| = [ H(y™) dP + [ H(y™) dP. 
€ & 


However, owing to the assumptions made, and in view of (5.10), 


612) | [ Hw) aP| < max| H@™)| -P@ = 0(4). 


In the event &, clearly, 


Pp P ‘ ~ on 
(5.13) H(y™) = > HO)y”’ +4 Dd Hi(Oy™ yfyf 
t=—1 


j= 





AUTOCORRELATION IN TIME SERIES 
where 


sas aH ia aH oi: jie 
HY) = ay. ’ Hi(Y) = aY.ayY; (2,7 = 1,2,--- »P); 


Y stands for the vector (Yi, Y2, ---, Y,) and 0 < 6 < 1. Hence, owing to 
(5.11) and (5.12), 


é , z ) CY) CN) 
ElH(y™)) = > Ho) [uw P+id [ Hou” yy aP + o(34), 
i=l & & iv” 


i,j=l 


or 


| EIA (y)] -> HO0)E(y;”) + 4 > H.(O)E(yS’ y$”) 


i,j=l 


P P 
— 2, HAO) [ yi” dP — 4 DO H;;(0) [ yf ys” aP 
é 


t=] i,j=l1 


i,j=l 


+4. [ tH) — HoOlwi?ys” aP + 0(,4). 
However, owing to the Schwarz Inequality and according to (5.8) and (5.10), 
(5.15) [ ue? ap} s (EU¥P@y" = ow”. 
& | 


Similarly, owing to (5.9), 


|. ys’ ys” dP si fo (y™)* ap P(g)}*” 
E (%é 


(5.16) 
< {E(y$”)‘ E(y my 1/4 P(g)]? sl o(+). 
N? 
Finally, by the Schwarz Inequality and in view of the definition of 6, 


[ Hy) — HsOlyys aP | 
“& 


1/2 1/2 
if [H.;(0y) — His(O)Py!”” ap\ lf ys” aP | 
& 
< EY”) EYP” 


Hence 


lim N| | (Huy) — Hs Oly ys aP | < alewes)™, 
“s j 


N+ 
and, consequently, 


(5.17) lim 5 | soy) se H.(0)ly’y ws dP “2 n > ( (cx; ¢;;)"” 


N+ 


2 én 





154 Z. A. LOMNICKI AND 8. K. ZAREMBA 
Now, in view of (5.1), (5.8), (5.15), (5.16), and (5.1 we obtain from (5.14) 


—_ Pp 
lim | NE(H(y“)| — > H0)c; - 3 >> H;(O)e; | < 


Nom | i=1 i,j=l 


which proves (5.4), since the choice of ¢ was arbitrary. 
An argument similar to that which led to (5.7) and (5.12) shows that 


E(H(y™)G(y’)| = | H(y™)a(y™) dP + O (x) 
& iv" 


re ° (N) . ~ oo ° ° ° ° 
Using the expansion of H(y*"’) given by (5.13) and a similar expansion of 
(Ny) Fk ; ‘ d : . 5 
G(y~’) based on a similar notation for the derivatives, the preceding equation 
can be written as follows: 


(EIA (y™ )a(y)) 


P Pp 
= >} AO)G OEY’ ys’) -— > H(0)G(0) |. y| 
t,j=l i,j==l £ 


P 


+3 > 0) | Gly yu ve dP 


i,j, k=l 


p . 
+3 D0 G,(0)  Hy(oy)y? yj; ye aP 
i,j k=l 


“& 


p 
+ 3 p [ Hs(oy" \Gel@’y»)y” y$* yi y; dP +0(+ :), 
1jAl=1l “§ d 
where again 0 < @’ < 1. 

Multiplying both sides of this equality by N and making N — ~, we obtain 
(5.5). Indeed, the left-land sides agree in view of (5.1), and, owing to (5.8), 
the first term in the right-hand side of (5.18), after multiplication by N, te nds 
to the right-hand side of (5.5). All the other terms in the nght-hand side of 
(5.18) can be neglected: the dacenil is O(N“) according to (5.16), and simple 
applications of the Schwarz Inequality show that the third and the fourth 
terms are O(N *”), while the fifth term is again O(N”). Thus the proof is com- 
plete. 


6. The estimation of the autocorrelation coefficients. 
Proposition 15. If, under the assumptions of Section 2 


Chw 
(6.1) pw = . 
’ Cine 


-— ‘ 
where Czy is given by (3.8) and 


(62) Adv = 1S | ¥ Avo |, Bow = : |» hk = Bo; OF 
V t=) 


V t=l =) =m 


then 


(6.3) lim NE(pew — px) =—(m + 1)(1 — px) = Po +2 > (pr Pr — Pq Pork) 


N+w q=— 0 q=— 





AUTOCORRELATION IN TIME SERIES 


Proor. Let 


Vit Re R, 


(Yo + Ro)"?(V¥3 + Ro)? Ro 


(6.4) H(Y, ’ Y2 ’ Y3) 7 


According to (6.1), 
pt.w — pe = H(yi”?, y2” 
where 
(6.5) yi” = Cin — Ri; ys” = Aon — Ro; ys” = Bow — Ro. 


The assumptions of Theorem 14 are clearly satisfied; (5.3) follows from 
Proposition 13, while the uniform boundedness of H(y{”, yi”, yi”) follows 


from the Schwarz Inequality. In view of Proposition 7, 


+0 
¢; = —(m+ 1) > R,. 
q=u—o 
+ * ’ . * . . * 
Since Agy = Cf, and since Boy is an estimator of Ry analogous to Co, ap- 
plied to the process {2,,.}, 


Co = C3 = (1. 


. ‘ . (N) (N (N 
For the same reasons, Corollary 10 applies not only to y;"’, but to y2°° and y; , 
as well; hence 


(6.6) 


On the other hand, 


s m™m Pe c 1 m 
Aow = R; oc J > a; Q; ; Cin = Ri» — 2 > aib;Q; , 


V i=l V i=0 


5 N m 
»* r 2 l 2 1 2 
Be y= is = t oe - py Zz = bo bQ; ° 
v V t=N—k+1 


V i=O 


By an argument similar to that used in the proof of Proposition 9, we find 


var (- z aQ) =0O (4) and var ¢ > b3 @.) = 0 (4) : 
i=0 e ) i= : 


hence, and from (4.10), by applying the Schwarz and Triangle Inequalities, we 
obtain 


N cov (Aj. ; Bow) = N var Ro» + O(1/r”), 

N cov (Atw ; Chw) = N cov (Ro» ; Ri») + O(1/r*”), 

N cov (Boy ; Chx) = N cov (Ro» ; Ri») + O(1/r"”). 
In view of (2.7), this implies 


(6.7) Cyo = Cig = VE and Cog = Voo. 





156 Z. A. LOMNICKI AND 8S. K. ZAREMBA 


Substituting in (5.4) the values of the partial derivatives of (6.4) at the point 
Y, = Y, = Y; = 0, as well as the values obtained above for c; and ¢,; (7, 7 
1, 2, 3), we find 


1 + 
lim NE(ptw — mp.) = —(m+ 1) = =. R,(Ro — R,) 


” 
N+ow Ro q=— 0 


) +20 


> (RR — RoR, Rew) 


i Ss 
which is equivalent to (6.3). 

Thus the bias of the estimators of the autocorrelation function is composed of 
two parts: one which is due to the bias in the covariance estimators (and is, 
therefore, proportional to m + 1), and another which is a result of the correla- 
tion between the numerator and the denominator, and is still there when there 
is no trend to eliminate. (See Corollary 17 below.) 

Remark 16. If m = 0, (6.3) yields the bias of the estimator p:.y of the auto- 
correlation function when the process is stationary with an unknown mean. This 
result can also be obtained directly from the formulae given by Kendall [4]. 


It is sufficient to note that Kendall’s Eqs. (7) and (8) can be written in the 
following form: 


— ‘ v l 5 
‘ sean }1 
E(A) = pe - (v — j) press? 3 E(B) = 1 —-—-<- Ms (vy — j)p;? , 
\ ’ V \V jmi—» 

where the expressions in curly brackets, as partial Cesaro sums of the infinite 
series > j=. pri; and > f=... p; respectively, tend to > -j2@.. p;. 

Coro.tuary 17. If, the assumptions of Section 2 being satisfied, f(t) = 0, 
and if the estimation of the autocorrelation function is based on the assumption that 
there is no trend, 1.e., if we make 


’ Ciw 


2° Pin = N—k 1/2 N 


V t= V t=k+1 


then 


+0 
lim NE(#k.w — pr) = 2 DD (pees — Pe part)- 


N+x qu 


The proof is entirely similar to that of Proposition 15, with the exception that 
nowq =—=G@=--ct3 => 0. 


Proposition 18. Under the assumptions of Section 2, 


lim N cov (pi ; pr) 


Now 
(6.9) +20 
" z. (Pq Park—1 + Pa+k Pq—t — 211 Po Park — 2p Pq Pq+t + 2p Pl Pq). 


q=—a0 





AUTOCORRELATION IN TIME SERIES 


Proor. Retaining the notation of the proof of Proposition 15, make 


G(Y4, Ys, ¥e) = Yat Ri me 


(Ys + Ro)(¥e + Ro) Ro’ 
and 
Cin -Ri; ys8? =Adw—Ro; yi” = Bow — Ro, 
where 


9? 


Atx = lv _ 2 Ano | Mat > |v _ 2 Bvt | 


V t=l . 
so that 
- + 
G(Ys, Ys, Ue) = Piww — pl. 


Then Theorem 14 can be applied if H and G are regarded each as a function of 
the six variables Y, , Y2, ---, Ye. 

According to Proposition 9, c14 = vg, and, by an argument similar to that 
used in proving (6.7), we find 


Cp = Cre = Vu, 


Hence, according to (5.5), 


Nwx 


lim N cov (pk.w pi.) " a a [Ri RResn—t > R; Ror Ry-1 


to @=—@ 


— 2RoRiR, Rez — 2RoRi.R,Rqsi + 2R. Ri Ril, 
which is equivalent to (6.9). 
Coro.iary 19. Under the assumptions of Corollary 17, 


(6.10) lim N cov (fx.~ ftv) = lim N cov (pw pin) 


Nox N+x 
(See [1], [2].) 
Finally, it should be noted that there is no difficulty in applying, with the 
necessary modifications, the same method to the investigation of other covariance 


and autocorrelation estimators, e.g., to the circular estimator of the autocorrela- 
: . . vk * 
tion function, or to Cry / Cow . 


7. Acknowledgement. The authors wish to thank Mr. J. D. North, Chair- 
man and Managing Director of Boulton Paul Aircraft, Ltd., for his permission 
to publish the present note which originated in an attempt to solve a more 
general problem proposed by him. 


REFERENCES 


[1] M. S. Bartierr, ‘On the theoretical specification and sampling properties of auto- 
correlated time series’, J. Roy. Stat. Soc., Suppl. Vol. 8, No. 1 (1946), pp. 27- 
41. 





Z. A. LOMNICKI AND 5S. K. ZAREMBA 


. 8S. Bartierr, An Introduction to Stochastic Processes, Cambridge University Press, 
1955. 
. Cramtr, Mathematical Methods of Statistics, Princeton University Press, 1946. 
. G. Kenpatt, ‘Note on bias in the estimation of autocorrelation’’, Biometrika, 
Vol. 41 (1954), pp. 403-404. 
. G. Kenpau, The Advanced Theory of Statistics, Vol. I1, Ch. Griffin & Co., London, 
1946. 
[6] K. Knopp, Theorie und Anwendung der Unendlichen Reihen, 4-te Auflage, Springer, 
Berlin, 1947. 
[7] Z. A. Lomnicxr anp 8. K. Zaremsa, ‘Some applications of zero-one processes”, J. 
Roy. Stat. Soc. Ser. B, Vol. 17, No. 2, pp. 243-255 
{8] F. H. C. Marriotr anp J. A. Pops, “Bias in the estimation of autocorrelations”’, 
Biometrika, Vol. 41 (1954), pp. 390-402. 





APPROXIMATIONS TO THE POWER OF RANK TESTS 
By Cu1a Kvuer Tsao 


Wayne State University 


Summary. Proposed is a method for approximating the distribution of the 
ranks, which is the basis for evaluating the power of an arbitrary rank test (see 
definition of ‘‘rank test’’ in Section 2 below). The method involves, in essence, 
a transformation of the original distributions, by means of interpolating poly- 
nomials, into distributions defined on the unit interval (0, 1). A somewhat 
detailed discussion is given to the problem of testing the hypothesis that two 
populations are identical against the alternative hypothesis that they have two 
specified (non-identical) distributions. Explicit formulas for approximating 
the distributions of the ranks under the alternative hypothesis are given. A 
few tables are computed for the case where both distributions are normal with 
the same variance but different means. 

The last section is devoted to the investigation of the asymptotic power 
efficiency of certain rank tests. 


1. Introduction. A number of rank tests [8] have been proposed for testing the 
hypothesis that several populations are identical. The power of such (and other) 
rank tests is determined by the distribution P(R) of the ranks under the alter- 
native hypothesis. In [5] Hoeffding gives a simple formula for P(R) for an arbi- 
trary alternative. However, the difficulty in evaluating this formula makes it 
hard to obtain the exact power results, except in special cases. Fortunately, this 
difficulty has been partially overcome by other means. As examples, we mention 
here a few known results. Terry [10] investigated empirically the power of 
Hoeffding’s ¢,(R) test [5] against normal alternatives for the two-sample case. 
Lehmann [6] investigated the power of several two-sample rank tests against 
some nu:. parametric alternatives. Dixon [1] obtained, by numerical methods, 
some power curves of several two-sample rank tests for normal] alternatives. 
Recently, Teichroew [9] obtained a few empirical power curves for the same 
alternatives. 

We shall in this paper investigate the power of an arbitrary rank test (see 
definition of ‘‘rank test”’ in Section 2) against an arbitrary alternative hypothesis. 
Since no method has yet been found for evaluating analytically the distribution 
P(R) of the ranks, we shall here propose an approximation procedure which, 
as we shall see, appears to be quite satisfactory in certain cases. The computa- 
tions are carried through for a few exe mples. However, in order to make practical 
uses of the method, much more systematic computation is required. 

For convenience we shall here make the assumption, to hold throughout this paper, 
that all distribution functions that we consider are to be continuous. Furthermore, 

Received February 20, 1956; revised July 5, 1956. 

159 





160 CHIA KUEI TSAO 
all definitions and notations given in the following sections will remain the same 
throughout the discussion. 


2. Rank tests and power of rank tests. The term ‘“‘rank test”’ is used here in a 
rather broad sense. For testing the hypothesis that several populations are 
identical, a test will be called a “rank test,” if it is based entirely on the ranks 
of the random variables. The following consideration will illustrate the point. 

Let 


(2.1) Fy, ' ee F, 


be k + 1 continuous univariate cumulative distribution functions (edf’s) de- 
fined over the infinite interval (— «, «) (or over a finite or half-infinite interval 
(a, b)). Let 


(2.2) a= (Zo _, +4 ee Zi) = (Zo a* % 8% Zome 4 2 Fe Zu = 
where 
(2 3) Z; = (Za ars aon ae Bis) 


are the ordered values of m; independent random variables distributed accord- 
ing to F(z), i = 0,- - - , k; that is, for each 7(¢ = 0,- - - , k), we have 


(2.4) Za<-++ < Lim, 
Let 

(2.5) 0 = (00,°° +, Oe) = (0n,- +--+, Oom,° °° 
where 

(2.6) 0; = (0a,° + + , Dim) 


are the ranks of the m; variables Z; in Z; obviously, for each 7 (i = 0, - 
we always have 

(2.7) 64 <- 

Let 

(2.8) R = (To1 ; vk bos Mae » Tome git? ging * < ae m,/ 

be a permutation of the first 4 = m+ - - - + m, positive integers (1, - 
M) such that for each i (i = 0,- - - , k), we have 


(2.9) FacKe* 2 2& Tim, + 


It is evident that there are 7 = M!/][[m;! permutations R. Let 2 denote the 
class of all possible sets w of such permutations R. Then, 6 is a random variable 
over @. 

Now, suppose the hypothesis 


(2.10) Ho:Fy = 





POWER OF RANK TESTS 


is to be tested against the alternative hypothesis 


(2.11) Hi:F; = Fi, 


where Fp ,- - -, F; are k + 1 given ecdf’s. We shall denote by h(R; H,) the 
probability distribution function (pdf) of @ under the hypothesis H; ; that is, 


(2.12) h(R; H,) = Pr (6 = R| A), +=0,1. 
Let us denote the 7 permutations R by R,,- - - , R, in such a way that 
(2.13) A(Riss ; Ai) S ACR; ; Mh), t=1,---,n—1. 
Let a, ,- - - , a, be ¢ positive integers such that 
(2.14) , Qa,+:-+-++a, = ». 
Let 
R,u---uRa,,, 
= Rayt---U Raja; 
S; = Re,+---40,,41U°°°UR,. 

Let X be a random variable derived from the random variable @ such that 
2.16) X =i if 6¢8S;, [a 2 aoe 


In other words, the group of k + 1 random samples of sizes (mo, -- - , m,) 
will be designated by 7 (¢ = 1,- - - , ¢) if the rank set 6 of the sample values is 
in S;. Clearly, under H;(i = 0, 1), the random variable X is distributed ac- 
cording to 


(2.17) Pr (X = 2| H;) = piz, l,--: 
where 

Pa = h(Ri; Hi) +--+ + A(R, ; A), 

Pio = h(Raj41; Hi) ++ + + + A(Raj +0, ; Hi), 


Pit = h(Ra,+---40,.,41 3; Hi) + - + > + ACR, ; Ai). 


Now, according to the definition, there are many possible rank tests for test- 
ing Hy against H, , among which we mention the following three classes: 

(a) Univariate rank (UR) tests. This class includes all tests which are based 
on a single random variable X having the pdf (2.17). 

(b) Multivariate rank (MR) tests. This class includes all tests which are 
based on g(g 2 2) independent and identically distributed random variables 
X,,:-.-,X, having the common pdf (2.17) 





162 CHIA KUEI TSAO 


(c) Sequential rank (SR) tests. This class includes all tests which are based 
on a sequence of successive independent, identically distributed random vari- 
ables X, , X2,-- - , each having the pdf (2.17). 

Many rank tests have been proposed in the past [8]. While most of them are 
UR tests, a few (e.g., the sign test [3] and the sequential probability ratio test 
[15, Ch. 6]) are MR and SR tests. Other MR and SR tests can be obtained by 
the use of various goodness of fit criteria, such as the rank sum and the sequen- 
tial rank sum tests introduced in [11] and [13]. 

Since the asymptotic efficiency of the rank sum test will be investigated in 
Section 6, it is convenient at this point to describe this test in some detail. 

To employ the rank sum test, we shall choose the constants a, - - - , a in 
such a way that they are all equal and use 


(2.19) s=X,+---+X, 


as a test criterion. It is shown in [12] that under H; the probability distribution 
funetion of s, denoted by q(y; pi, g), satisfies the following recursion formula 


t 
(2.20) qQy; pi, n) = 7 P53 Uy —7J;pi,n — 1), 


j=1 


with the initial conditions 


(2.21) 1) = {9 ve 


otherwise, 


where 
(2.22) Pi = (pPa,***, Pit), (= 


Using this criterion, a critical region of a one-sided test would consist 
g+1,--+,9+ 7, where r is a non-negative integer so determined that 


go?T 
(2.23) Q(g +7; po,g) = 2 q(J; Po, 9) 
s=g 
for a predetermined level of significance a. The power of this test is, of course, 
given by 


Orr 


(2.24) Q(g+7r;71,9) = a q(j; Pi, 9). 


i=9 


For an equal tailed two-sided test, a critical region would consist of g, g + 1, 


-,g+7*, gt — r*,-- - , gt, where r* is a non-negative integer so determined 
that (2.23) holds with r replaced by r* and a by a/2. The power of such a two- 
sided test is given by 


1+ Qg¢ + 1r*;m,9) — Qgt — r* —1;~1, 9). 


We note that when ¢ = 2, these tests reduce to the binomial tests and that 
for 3 S$ t S 6, the function Q(y; po, g) is tabulated in [12] for 1 < g <= 20. 
Having indicated the scope of rank tests, we shall be concerned mainly with 





POWER OF RANK TESTS 163 


the evaluation of the power of such tests. Clearly, in order to evaluate the power 
of any rank test, it is necessary to find first the probability distribution function 
h(R; H,). The main purpose of this paper is, therefore, to develop an approxi- 
mation procedure for evaluating h(R; H,). This procedure is quite effective 
when 7 is small. For large 7, the approximation of the individual terms h(R; H,) 
becomes tedious, since there are too many entries involved. For example, in 
the two-sample case, there are 252 entries when m) = m, = 5, and the number 
of entries rises to 12870 when mp) = m = 8; in the three-sample case, even 
when mp = m, = m, = 3, the number of entries is already 1680. Consequently, 
when the distance between Hy and H, (in a suitable sense) is so small that no 
sensitive UR test based on small 7 can be found, a MR or SR test based on a 
large number of groups of small samples may be recommendable, especially 
when it has certain other preferable properties. 


3. Rank preserving transformations and equivalent tests. Our first step in 
evaluating the pdf h(R; H,) is to transform the original distributions F , - 
Ft into some new distributions defined over the unit interval (0, 1). 

Let T(z) be a continuous, strictly increasing function defined over the in- 
finite interval (— x, «) such that T(—~«) = 0, T(x) = 1. Let 


> 


(3.1) V = (Vo,---, Va) = (Van,-- +5 Voms° °°» Vany* > * 5 Vim) 


where 
(3.2) Vi = (Va,- +--+, Vim) = (T(Za), > > >, T(Zim;)) 


are the ordered values of m,; independent random variables distributed accord- 
ing to 


viv) = ¥(T(z)) = F(z), 


>, Oo) = (6n,°° > » Gome » ° 


© 


6; = (6a,° °° , dim,) 


‘? 


are the ranks of V; in V. Then, clearly, 6 is also a random variable over Q, and 
has the same distribution as 6([6], Theorem 8.1). It follows that a rank test for 
testing Hy against H, is equivalent to the same rank test for testing the hy- 
pothesis 


(3.6) Ho:Wo(v) = 

against the alternative hypothesis 

(3.7) Hi:Wilv) = (vr), 
where 


(3.8) @(v) = &(T(z)) = FP(z) 


, 





164 CHIA KUEI TSAO 

We shall assume that the functions ®o(v), - - - , ,(v) are differentiable on (0, 1) 
with continuous derivatives go(v), - - - , g(v) respectively. That is, 

: : , , 

(3.9) gi(v) = @;(v), 


4. Polynomial approximation of a cumulative distribution function. Our next 
step is to find polynomial approximations for ®o(v), - - - , ®(v). That is, for a 
given cdf F*(z) and a given transformation T(z), an interpolating polynomial 
of the function 


(4.1) ®(v) = &(T(z)) = F*(z) 


is to be found. Now, letting 0 = vm < 4 <---< m% = 1 beh + 1 chosen 
points on the unit interval (0, 1), we shall find a polynomial P(v) of degree / 
such that 


(4.2) P(v,) = P(v,), r=(f,- 
Since, in this case, the function ®(v) is a edf so that we always have 

(4.3) (0) = 0, (1) = 1, 

then, P(v) must have the following form 

(4.4) P(v) = aw + aw +--- + aw’ 

with real coefficients a, ,- - - , a,. The derivative of P(v) will be written as 

(4.5) pv) = bo + bw +--+ + + by’, 

where, of course, 

(4.6) q=h-—-1, bs = (¢ + lain, i G++ “28. 


In the following, we shall derive some formulae for determining the coeffi- 
cients a,,- - - , a in terms of a given set of values (vp , P(vp)), - - - , (vn, P(v,)). 
Our derivations are based on the Lagrange’s interpolating polynomials. It 
should be pointed out, however, that equivalent formulae can be obtained by 
other methods (e.g., by means of Newton’s interpolating polynomials [14}). 

Suppose we let 


(4.7) w,(v) = (v — vo)(v — v1) - + + (VU — Up 


, \ . . c . 
and let 2,(v) be the derivative of 2,(v), we then obtain 
THEOREM 4.1. For any given set of values (vo, P(vo)),- - - , (Un, P(va)), the 
coefficients a,,-- - , a, of the polynomial P(v) in (4.4) can be written as 


h 
(4.8) = ee 


/ 
e=1 7,(v. ) 
where C,, is the coefficient of v’ in the expansion of 


(4.9) 





POWER OF RANK TESTS 165 


Proor. From [{7, p. 83], the Lagrange’s interpolating polynomial is given by 


h 


(4.10) P(v) = 2 &(v,) LL” (v), 


= 
where 

(4.11) i (v) = mv) /[(v — v,)an(0.)], 
But this can be expanded into polynomial form as 


h 
(4.12) L” (v) = [> ca | / rh(0,), 


rel 


By substituting (4.12) into (4.10), it is easily seen that the coefficients a, , 
a, are given by (4.8 


Corouuary 4.1. Jf (vo, - + +, vm) are equally spaced, the coefficients a, , - 
a, can be written a 


(4.13) a, = <= (4) () c1.60 ), 
h! =1 8 / 


where c,, is the coefficient of y' in the expansion of 


A 
(4.14) Il@w-)/-=8), 


Proor. Since, by assumption, we have 
(4.15) , = rsh, 


this corollary follows directly from Theorem 4.1. 

Consider now a simple application of Corollary 4.1 to the normal alternatives. 
Let N(z; A, B) denote the cumulative normal distribution with mean A and 
standard deviation B. Let the transformation be given by 


(4.16) v = T(z) = N(z; yn, @). 


Some approximating polynomials of degree five are obtained for the normal 
alternatives 


(4.17) (vy) = O(N(z; yu, c)) = N(z;n + do, co). 


Table 1 gives the coefficients a, ,- - - , as for the cases d = 0, 0.25, 0.50, 0.75, 
1.00, 1.25, 1.50. 


5. Two-sample problem. Applying the results of Sections 3 and 4 to Hoeffding’s 
formula ([{5] p. 88), we shall now find the approximate values of h(R; H,;) for 
the two-sample case (k = 1). Since, in the two-sample case, the complete set 
of ranks @ is determined by the ranks of Zy,- - - , Zim, alone, we need con- 


sider only the distribution of the ranks 
(5.1) 6, = (01 ,° * * 5 O1m;)- 


Consequently, if we let 


(5.2) S=(,--- 





CHIA KUEI TSAO 


TABLE 1 
Coefficients of Approximating Polynomials of Degree Five for Normal Alternatives 
N(z; pw + do, oc) 


1 0 0 0 0 
. 590656 .512090 .083386 345181 325821 
. 355264 . 307583 1.195227 2.174861 316787 
715 256026 ~ .409675 3.578142 5.416004 2.991511 
-00 . 255337 1.475861 6.873877 9.976240 5.322887 
.25 .319949 759978 10.879441 — 15.632358 192946 


9 
50 .422091 4.146164 15.319144 — 21 .979847 384776 


be the permutation of m, out of the first mp + m positive integers such that 


(5.3) a< 


I ~ Sm, ; 
then, our sample space 2 may be considered as being made up of all subsets w 
of such permutations S. 
To approximate h(R; H,), we shall employ the convenient transformation 
~- * 
(5.4) v = Fo(z); 
that is, we may assume ®(v) = v. 


THEOREM 5.1. Let the interpolating polynomial P(v) of the function 
' v* v* 

(5.5) P,(v) = O,(F6 (z)) Fy (z) 
and the derivative p(v) be given by (4.4) and (4.5). 

Then, the approximate value of h(R; H,) ts given by 

mo + m\” 
7 \ T l no n n ’ ‘ 
(56)  aA(R;H)> ( , ‘) YH veP «-- HEE CCS; 
m 


q 
where the sum 7 is taken over all possible (no , - 


Mt 
and where 
I'(Sm,+1) . 
C(S;m,° + or Ss. oe 
; ’ ’ 4 ’ — 
I'( 81) j=1 
in which the sum >,’ is extended over all possible (i, - 
them are equal to i, 7 0,- +--+, qand 8m,41 mo + m, + 1. 
Proor. By the same argument as in Section 4 of [6], it can be shown that 


1 ' 1 
(me, + m,)! , 1 
ee " soak ™ II eI (] _ 
0 my as 
I] (sen — 8 — 1)! 
r= 
m 


x LT ene 22° 2m.) dz 


i 
1 


~1 


\—2 
h(R; H;) = e = ma 





POWER OF RANK TESTS 
where s = 0. Since 
I] ¢ilz; -- + 2m) & I] by + by2j -+- 
?=1 i=! 
> bs? by? 


- a: bo br 
we then have 


. 1 ' ’ 
Me + mM, (M5 + ™,)° 
h(R; Hi) = ( se | iS 
m J Jo ™ 
I] (Sja4 — 5. ae 1)! 
?7=0 


Ud —g)"*" i dz, +++ dam, 


mM + m © atts As 
. ) Dogeog «+. 


Mo (8544 + 2; 


This completes the proof of THrorem 5.1. 


Coro.uvary 5.1. Under the conditions of Theorem 5.1, the approximate values 
of Puy: * + , Pru are gwen by 


1 
o = Mo rt ™ mon . : 
(5.7) pr = ( ) » bo by" +++ bot Kj(ng, mM, +++, Ne), sm ieee 


m 


whe re 


(5.8) Kj(m, M1, *** ,M%_) = 2 CUS; Mo, Mi, *** 5 Ng). 


SeS; 


This corollary is a direct result of Theorem 5.1. 

As an application of Theorem 5.1, we shall now consider the problem of test- 
ing the hypothesis H) that two populations are identical against the alternative 
hypothesis H, that they are two normal populations with same variance but 
different means. More precisely, we have under H, , 

* , 
(5.9) — = NAG; a), 

Fi (z) = N(z; + do, co). 
Since, in this case, the function A(R; 1,) depends only on the parameter d, we 
shall write h(R; d) in place of A(R; H,). Table 2 gives the approximate values 
of A(R; d) for the cases (mp, m,) = (2,2), (3,3), where the ranks R of the two 
samples have been replaced by 0’s and 1’s, following the conventional notation ; 
that is, 0’s represent Z's and 1’s represent Z,’s. The interpolating polynomials 
used are those given in Table 1. The ordering is made according to the ¢,(R) 
criterion, that is, the optimum rank order criterion for small d proposed by 
Hoeffding [5] and studied in detail by Terry [10]. In cases where the c; value is 
the same for two or more rankings, however, the order is by increasing prob- 
ability for the case d = .25. We note that, in Table 2 we have defined R’ as the 
ranks of the sample values in the decreasing order; that is, R’ is obtained from 
R by interchanging 0’s and I’s. 





CHIA KUEI TSAO 


TABLE 2 
Approximate Values of h(R; d) and h(R’; —d 


Ranking 


mo = my, 


-11727 .O7917 .05136 -03218 01970 -01206 
- 13603 . 10586 .07893 .05676 03969 .02725 
- 16223 . 14964 13087 - 10871 08607 .06541 
16307 . 15308 13881 . 12293 10791 09534 
-19401 .21413 . 22391 - 22206 . 20961 . 18958 
. 22739 . 29812 .37612 45736 .53702 .61036 


mo = m = 3 


01536 0077: .00370 00170 00078 

03157 .01859 01025 | .00535 00267 00132 

.03527 .02313 01424 00836 00479 00274 

03532 02323 .01427 | .00823 00448 00233 
101010 03945 02883 .01968 | .01266 00776 .00459 
110001 04069 03049 .02099 | .01326 00770 00416 
011100 .04098 03144 .02274 01582 01093 .00780 
100110 .04382 .03579 .02759 .02034 01453 .01018 
101001 .04541 .03771 .02869 02006 01296 00778 
011010 .04580 .03906 03117 .02350 01704 .01220 
100101 .05037 04652 03957 03125 02320 01645 
010110 .05081 .04828 04320 .03678 03004 .02368 
011001 .05267 .05088 04494 .03635 02719 01924 
100011 .05669 05944 .05782 05245 04484 .03680 
001110 .05714 06166 .06368 06402 06364 06325 
010101 05836 06248 .06132 .05528 04608 .03584 
001101 06556 07945 .08934 .09410 09429 09174 
010011 .06563 07955 08870 09056 08427 07106 
001011 .07367 10081 | .12793 . 15056 .16471 16851 
000111 08219 .12730 .18613 .25737 .33718 41955 


Althougu no theoretical results are given as a basis to reveal the error of our 
approximations, the values for some of the extreme cases in Table 2, that is, 


for the cases R = (000111) and R = (111000), can be compared with the re- 


sults of Dixon [1] and Teichroew {9}. These comparisons show that our approxi- 
mate values and the exact figures agree to two or three decimal places for d S 
1.25. This may be considered as quite satisfactory for many practical purposes. 


If, however, one wishes to attain more accurate approximations, polynomials 
of degree higher than five must be used. This, of course, would result in more 
extensive computations. 

6. Asymptotic power efficiency of the rank sum test. The purpose of this 
section is to investigate the asymptotic behavior of the rank sum criterion de- 
scribed in Section 2. As a representative, we shall consider only the problem of 





POWER OF RANK TESTS 169 


testing the hypothesis Hy that two normal populations are identical against the 
alternative hypothesis H, that they have the same variance o but different 
means vp and , . 

Let n(z; A, B) be the normal density with mean A and standard deviation 
B. Let 


(6.1) Z = (In,° * * 5 Lome» Ty * * * » Lim) 
be a point of a m» + m, dimensional Euclidean space and let 
(6.2) f(x; d, mo, m) = II Il (25; %, 0). 
t=) j=] 
where »; = vw) + do. Then, h(R; d) can be written as 
A(R; d) = h(R; d, mo, m) 


»¢€ —1 
(6.3) - - 1 " / ie. ob / (mo + m)! f(xe : d, ™M, m) dz, 


OOK EQ 16 Zi yO 
where 


. (R) (Rk) (R) (R) 
(6.4) Ze = (Xe 9° ° * 9 Lome » Ti 9° ° ° 5 Bins) 


is the rearrangement of z according to R (in the obvious manner). 
For deriving the power efficiency of a rank sum test, it will also be convenient 
to write Pu, Piz, * * * » Pre aS 
Pul(d; mo, m,, a) = A(R, ; d, mo, m) + --- + A(R, ; d, mo, m), 
(6.8) 


Pid; mo, mM, , a) = A(Raeayi 1d, Mo, mM) +--+ + A(Rae 3 d, mo, m), 


where t = 2 and a 2 | are two integers such that 


(6.6) al=4= i = a 


mM 


Consider a rank sum test which is based on g groups of two samples of sizes 
(mo, m,) and which is to have the strength (ao, a) for testing Hy against H, 
(that is, the probability of rejecting Hy under H; is a; , i = 0, 1). If under H; . 
the mean and variance of the rank sum statistic s are designated by yu; and a; 
respectively (¢ = 0, 1), then, from (2.4) of [11], we have 


wo = g(t + 1) /2, 
o, = g(t — 1) / 12, 


t 
ma ¢ ze jpis(d; m,™m, a) 
y=1 


t t 2 
= 9 Pa pij(d; ma, m, a) — (> jpij(d; mo, m,, a) ) | 
r=! i=l 





170 ; CHIA KUEI TSAO 


Now, suppose d is small so that g is large; then the distribution of s is approxi- 
mately normal. Consequently, for a two-sided test, g is to be determined so 
that 
(6.8) N(po + 2000 3 wi, 01) — N( wo — 2000 3; i, 01) Bl — a, 
from which we obtain 
(6.9) g = [Bo(d; mo, m,, t, a) / Bild; mo, m, t, « 


where 


t 
Bi(d; mo, m,t,a) = (t+ 1)/2 - > jpris(d; me, m, a), 


j=1 
° . 
Bo(d; mo, m3 t, a) 


ea a ieee a ; 2 
a /> Fj prj(d; mo, m,a) — (> ipula; mo, M4, a)) 
j=l j=l 


(6.10) 


_ ft! 
- & V 2’ 
and where zp and z; are two constants determined by a» and a; respectively. 

If m) = m, = 1, then ¢t = 2 and a = 1, and the test reduces to the sign test. 
It is well known that the asymptotic efficiency of the sign test (as compared 
with the most powerful (t-test) is 2/x. Consequently, we may first find the 
asymptotic power efficiency of a rank sum test as compared with the sign test, 
and then the corresponding power efficiency as compared with the t-test by 
multiplying the former by the constant 2/z. 

The large sample power efficiency of a rank sum test as compared with the 
sign test is now defined as 


(6.11) &(d) = 2g’ / (mo + m)g”, 


/ 


where g’ is the number of pairs of observations required by the sign test and 
” 


g” is the number of groups of samples required by the rank sum test. It is then 
readily seen that 


: 2 B:(d; mo, m,, t, a) Be(d; 1, 1, 2,1)\ 
6.12 &(d) = ( agi eend wee beanie —}, 
‘ te my + m, \Bild; 1, 1, 2, 1) Be(d; mo, m1, t, a) 

If we now let 


(6.13) E(x(r; | mo + m)) 


be the mean of the r:;th order statistic in a sample of size (mp + m) from a 
population with cdf N(y; 0, 1) and let 
my 
c(R| my + m) = - E(x(r,; | me + m,)), 
j=1 
(6.14) a 
A;(m,m,, a) = > c\(R; | mo + m,), 


i=(j—1)a+1 





POWER OF RANK TESTS 


then, the asymptotic power efficiency is readily seen to be 


t 2 
24 I> jAj(m, m, a) | 
(6.15) &(0) = lim &(d) = = _______ —_—,. 
os ( — 1)(me + m,) *« 7 ry | 340, 1, »| 
mo j=l 

By the use of the mean values of the order statistics computed by Godwin 
[4], some of the efficiencies are computed and displayed in Table 3; here again 
the ordering is made according to the ¢,(R) criterion. The values 28(0) / x in 
Table 3 are to be interpreted as the asymptotic power efficiencies of the rank 
sum tests as compared with the t-tests. 

In conclusion, we remark that the asymptotic power efficiency of one-sided 
rank sum tests can be obtained similarly according to formula (6.1) in [11]. 
This can be shown to be equivalent to (6.15) above. 

The author wishes to acknowledge the debt to his wife Ying-Lan Tsao, who 
carried out the computations of Tables 1, 2, and 3. 


TABLE 3 


Asymptotic power efficiencies of the rank sum tests 


mi t a &(0) 


-0000 


bo to 


ww = 
> tO 


—e— 2D 
bo 


u- 


9) 
2 
3 
3 
3 
3 
3 
3 
3 


wWwwhns wd Ww 
— 


> -_ 
ou 


w 


—_ 
Nwonagnaw ow 


pare ee Se Se eS SS S OO 

oo — 

a» © 

Kenan ogr ste WO Ole = tw Se Oe bo 


= 





1. 


2. 


3. 


CHIA KUEI TSAO 


REFERENCES 
W. J. Drxon, ‘“‘Power under normality of several nonparametric tests,’’ Ann. Math. 
Stat., Vol. 25 (1954), pp. 610-614. 
W. J. Drxon, ‘‘Power functions of the sign test and power efficiency for normal alter- 
natives,’’ Ann. Math. Stat., Vol. 24 (1953), pp. 467-473 
W.J. Drxon anv A. M. Moon, “‘The statistical sign tests,’”’ J. Amer. Stat. Assn., Vol. 41 
(1946), pp. 557-566. 


. H. J. Gopwin, “Some low moments of order statistics,’”’ Ann. Math. Stat., Vol. 20 


(1949), pp. 279-285. 


. W. Hoerrpina, “Optimum nonparametric tests,’’ Proceedings of the Second Berkeley 


Symposium on Mathematical Statistics and Probability, University of California 
Press 1951, pp. 83-92. 


. E. L. Leumann, ‘‘The power of rank tests,’’ Ann. Math. Stat., Vol. 24 (1953), pp. 23-43. 
. W. E. Minne, Numerical Calculus, Princeton University Press, 1949. 
. I. R. Savaae, “Bibliography on nonparametric statistics and related topics,’’ J. Amer. 


Stat. Assn., Vol. 48 (1953), pp. 844-906. 


. D. Te1curoew, ‘‘Empirical power functions for nonparametric two-sample tests for 


small samples,’’ Ann. Math. Stat., Vol. 26 (1955), pp. 340-344. 


. M. E. Terry, ‘Some rank order tests which are most powerful against specific para- 


metric alternatives,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 346-366. 


. C.K. Tsao, ‘Rank sum tests of fit,” Ann. Math. Stat., Vol. 26 (1955), pp. 94-104. 
. C. K. Tsao, “Distribution of the sum in random samples from a discrete population,”’ 


Ann. Math. Stat., Vol. 27 (1956), pp. 703-712. 


. C.K. Tsao, “Sequential rank sum tests,” submitted for publication in the Ann. Math. 


Stat. 


. Van Der WAERDEN, Modern Algebra, Frederick Ungar Publishing Co., 1949. 
. A. Wap, Sequential Analysis, John Wiley & Sons, Inc., 1947. 
. Tables of binomial probability distribution, National Bureau of Standards, Applied 


Math. Series No. 6, Washington, D. C. 


. Tables of normal probability functions, National Bureau of Standards, Applied Math. 
Series No. 23, Washington, D. C. 





SOME USES OF QUASI-RANGES' 


By J. T. Cuu 


Case Institute of Technology 


1. Summary. Confidence intervals for, and tests of hypotheses about, the 
interquantile distance are obtained, using one or two properly chosen quasi- 
ranges. Consistency (of the estimates and tests) is proved. Applications are also 
given to making inferences about the standard deviations of distributions whose 
cdf’s are of the form F((z — y)/A). 


2. Introduction. Let F(x) be the cdf (cumulative distribution function) of a 
given distribution. For a fixed p, 0 < p < 1, any &, satisfying 


(1) F(&, — 0) Sp Fé) 


is called a quantile of order p (or p-quantile) of the given distribution. If there 
exist two—and hence infinitely many—such £,’s, then one of them is chosen 
as the p-quantile. Let & be the g-quantile, where p < q < 1. The difference 
&, — &, is called an interquantile distance. For two reasons we are interested in 
methods of inference about £ — &, . First, the quantity itself is sometimes used 
as a measure of dispersion of the distribution. (An example is £75 — £5 , known 
as the interquartile range.) Secondly, for many familiar distributions, &, — &, 
differs from the standard deviation of the distribution only by a constant factor; 
consequently any inference about the former can be readily transformed into 
one about the latter. (See Section 4C.) 

Let a random sample of size n be drawn and 2, 22,°-- , 2, be the corre- 
sponding order statistics (in ascending order of magnitude). For any integers 
rand s where 1 S r S 8 S n, the difference z, — 2, is called a quasi-range. 
(Conventionally, z, — 2; is called a range and z, — z,, a quasi-range, only if 
s = n—r-+1,wherel <r < s. See, for example, [{1].) Symmetric quasi-ranges 
(%n—-r41 — 2,) and their linear combinations are useful in statistical inference. 
In fact, many uses of them are well known. (See, for example, [1], [5], and the 
references cited there.) In this paper we shall see some distribution-free methods 
of using quasi-ranges (not necessarily symmetric) in making inferences about 
& — &. Confidence intervals for & — £& are obtained of the form 
(%» — Xu, 2%, — 2,). To test the hypothesis, say &, — §, = d, we may then use 
as a critical region: 7, — z, < dor 2, — x, > d. If the integers r, s, u, and v 
satisfy respectively B,(s — 1,q) — B,(r — 1, p) = 1 — a and — B,(v — 1, g) + 
B,(u — 1, p) 2 1 — a, where B,(r, p) is the binomial cdf defined by (2), then 
the corresponding confidence interval is with confidence coefficient at least 
1 — 2a, and the test is of significance level at most 2a. If there exists more than 

Received March 4, 1955; revised October 22, 1956. 


1 Part of the work was done at the University of North Carolina under the sponsorship 
of the Office of Naval Research. 


173 





174 J. T. CHU 


one set of integers satisfying the inequalities, two optimal methods of selection 
are suggested. For large samples, however, these methods are shown to be 
equivalent, assuming that the parent cdf F(x) is continuous at x = &, or &. 
Furthermore, it is shown that if F(x) meets some additional continuity require- 
ments, then the statistics x, — x, and x, — x, , obtained by the optimal methods 
of selection, are consistent estimates of —& — & ; and the corresponding test is 
consistent with respect to the alternative &, — & # d. 


Some applications are given in Section 4. 


3. Consistency. Let F(x) be the cdf of a given distribution. Suppose that for 
given p and q, where 0 < p < q < 1,&, and &, are uniquely defined. Let 2; , x2 

- , X, be the order statistics of a random sample of size n. 

Lemma 1. If r, s, u, and v are integers such that 1 S r S 8s S nand 1 


<= n, and 


(2) B,(r, p) _ 2. in p (1 — p)" x 


i= 


, 


é>) 2 B,(s — 1, ¢g) — B,(r — 1, p) = L, 
P(z, — % S & — &) 2 —B,(v — 1, gq) + B,(u — 1, p) 


where P(A) is the probability of the event A. 
If F(x) is continuous at x = &, or &,, then 


(3’) P(z, = &, 
(4’) P(x, —: 


Proor. Let P(A, B) denote the probability of simultaneous occurrence 
the events A and B. Clearly P(z, — x, 2 & — &) 2 P(x, = &, 2, S& &p) 
P(z, 2 &) + P(t, S &) —1. Now P(x, 2 &) = B,(s — 1, F(& — 90)) 
B,(s — 1, q), since for fixed n and r, B,(r, p) is a decreasing function of 
p((3}, p. 127). Similarly P(z, S &) 2 1 — B, (r — 1, p). Therefore we have (3). 
To prove (3’), apply the same method to P(x, — x, < & — &,). Likewise we 
obtain the other inequalities. 

It can be shown easily (by (11)) that if n is sufficiently large, then for any 
a where 0 < a < 1, there exists at least one set of integers r, s, u, and v for 
which 


(5) Land L’ => 1— a. 


The corresponding z, — x, and x, — x, are then respectively confidence upper 
and lower bounds for &, — £, with confidence coefficients at least 1 — a. If there 
are two or more sets of integers satisfying (5), the following methods of selection 
may be used. (i) Select the pair of r and s which minimizes s — r, and that of 
u and v which maximizes v — u. This method has some intuitive appeal. But 





USES OF QUASI-RANGES 175 


quite often there exists more than one pair of r and s and one pair of u and v 
which satisfy the requirements. In such cases, selection perhaps should be made 
in accordance with practical consideration. (See a similar case in [7], p. 15.) 
Further, in the case of g = 1 — p, the quasi-ranges selected by this method 
are, in general, not necessarily symmetric or nearly so. But if a <-5 and n is 
large, then at least one x, — x, and one z, — x, are symmetric or nearly so. (See 
Section 4B.) (ii) This method, to be described later, determines the integers 
r, «++ , and v uniquely for given p, g, n, and a. It is more or less a natural general- 
ization of the method of using symmetric quasi-ranges for the case g = 1 — p. 
These two methods are not identical in general, but become equivalent (in the 
sense of (16)) when sample size increases indefinitely. 
Lemma 2. For integers t and w, where 0 S t, w S n — 1, and 


(6) c=4q P; 


ch 008e 


[(n — t)p/Q — c)] + 1, 8 r+ t, 
(8) u = [(n — w)p/(1 — c)] +1, v=ut wv, 


where \a| denotes the integral part of a. Then L and L’, defined in (3) and (4), are 
respectively non-decreasing and non-increasing functions of t and w. Further, let 
C, and cz be such thatO <a << e <a < 1. If t = [n ce] and w = [nq], then 


(9) Lim L = Lim L’ = 1. 


n—-~eo n—-oO 


On the other hand, if t = [n | and w = [n ee], then 
(10) Lim U = Lim U’ = 0, 
ne n+ 

provided that F(x) is continuous at x = &, or &. 

Proor. From (7) and (8), it can be seen that 1 S r, s, u,v S n. For example, 
s < n because s S (n — t) p/(1 — c) +t +1 < n+ 1. Hence L and L’ are 
well-defined functions of t and w. Now r is a non-increasing function of t. But if 
t is increased by 1, r is decreased at most by 1. Hence s is a non-decreasing func- 
tion of t, consequently so is L. In a similar way we show that L’ is a non-increas- 
ing function of w. 

It is well known ([2], p. 200, and [4], p. 193) that asn —- ~, 


(11) B,(r, p) — ® (x) > 9, 
uniformly in r,0 < r S n, where x = (r — np)/+~/np(1 — p) and 


(12) &(x) = [ (Qr)7t? @-? a. 


Asn — , it can be shown that if t = [n ce], and r and s are defined by (7), 


(r — 1 — np)/n'”® > — 2, 
(13) ~ 
(e—1l—ng)jn —-@& 


; 





176 J. T. CHU 


for r — 1 — np = n(c — c)p/(1 — c) + 0 (1) ands — 1 — ng = nb + 0 (1), 
where b = (1 — e2)p/(1 — c) + — q = (e2 — c) (1 — g)/(1 — c) > O. Com- 
bining (3), (11), and (13), we obtain Lim,...L = 1. Likewise we prove the 
rest of (9) and (10). 

Lemma 3. Corresponding to given n and a (0 < a < 1), let r; and 8; (w and 1) 
be a pair of integers which minimizes s — r (maximizes v — u) among all pairs of 
integers r and s (u and v) such that 1 S r(u) S s (v) S nand L(L’) 2 1 -— a. 
Let t2(we) be the least (greatest) integer among 0, 1, --- , and n — 1 such that if 
rz and 82(uz and v2) are defined accordingly by (7) ((8)), L(L’) 2 1 — a. (From 
Lemma 2, such r; , 8; , u; and v;, i = 1, 2, exist for any a if n is sufficiently large.) 
For fixed p; and q;,i = 1, 2, whee pr. < p < prandqgm <q < qq, define 


(14) kk=[npj]+1, m= ([nqj+1. 


Assume that F(x) is continuous at x = &, or &,. Then for sufficiently large n and 
t = 1, 2, 

heon,wek, 
(15) 

m S 8,5 Sm. 


As a consequence of (15), we have 


(16) mr ~ Te, 1 ™~ 8, Uy ~ Us, and Vy ~ Ve. 


(ry ~ rz means Lim, 71/72 = 1.) 
Proor. By (9) and (10), [nm a] S & S [n ce] for any fixed c; and c; for which 


Cc; < ¢ < ¢:, provided that n is sufficiently large. Choose c; and c sufficiently 
close to c, then r, = n(1 — cs) p/(1 - c) 2 npi +1 2 hk, and 


s& S n{(l — aq) p/l —c) +e] +0(1) S na Sm. 


Similarly we have r. S k, and s2 = m,. In the same way we show that uw. and 
v2 satisfy (15). Further, from (11), s, = m, and r, S ky for large n. Suppose that 
8, > m, for some n however large. Let pi < p < pz andq < q2 < q@;and p; 
and p2, and qs be respectively so close to p and q that q. — pi < q2 — po. Let 
k; = [n p:] + 1, i = 1, 2, and m; = [n qo] + 1. Then for large n, 7, S ke, 72 = 
ky, and s < m;. Therefore s; — 11 > mz — kz > my — ki = 82 — Pe. This, 
however, contradicts the definitions of r; and s,. Hence s; S m,. Similarly 
we show that 7; = k, and the rest of (15). 

The following lemma is a known fact ([2], p. 369). We state it without a proof. 

Lemma 4. Let a continuous distribution be given with cdf F(x) and pdf (prob- 
ability density function) f(x). Suppose that for0 < p < q < 1, &, and &, are uniquely 
defined; f(ép), f(E,) # 0; and f'(x), the derivative, is continuous in some neighbor- 
hoods of x = & and &,. If k = [n p| + 1 and m = [n q| + 1 (we assume that 
n p and n q are not integers), and x, and x» are the corresponding order statistics 
of a sample of size n, then asn — ©, Xm — 21% has an asymptotically normal dis- 
tribution with mean —, — & and variance O(1/n). 





USES OF QUASI-RANGES 177 


As a consequence of the previous lemmas, we have 

THEoreM. Let a continuous distribution be given whose cdf and pdf satisfy the 
continuity conditions stated in Lemma 4. For given n and a, let r;, 8;, ui, and 
v;, t = 1, 2, be the integers defined in Lemma 3. If x,, , etc., are respectively the 
rith, etc., order statistics of a sample of size n, then x,, — x,, and 2, — Zu,,% = 1, 
2, are consistent estimates of &, — &. 

Proor. Following Lemmas 3 and 4, for given 6, « > 0, if p:, g2 and 6’ are 
properly chosen, k; = [n pi] + 1, and m, = [n q] + 1, then, for sufficiently 
large n, P(z,., — 2, > & — & + 8) & P (fm, — 2%, > & — & + 8) BS 
P(2my — Li, > Eo. — E>, + 8) S e. Ina similar way, we easily complete the proof. 


4. Applications. 

A. Confidence intervals and tests of hypotheses. In Section 3 we proved, for 
given a and sufficiently large n, the existence of the integers r;, 3;, u;, and 
v;, 7 = 1, 2, defined in Lemma 3. To actually find these integers, we may use, 
for example, [6] and [8]. Then z,, — z,, and z,, — 2u, are respectively confidence 
upper and lower bound for &, — £& with confidence coefficients at least 1 — a, 
and (2,, — 2u;, Ze; — Zr,), a confidence interval with confidence coefficient at 
least 1 — 2a. 

Let Ho:&, — & = d. Then the tests, using as critical regions: z,, — 2, < d; 
Ie, — tu, > d; and z,, — z,, < dor2z,; — ty, > d, i = 1, 2, are respectively: 
(i) of significance levels at most a, a, and 2a, and (ii) consistent with respect to 
the alternatives §, — & < d;& — & > d; and &, — &, ¥ d, provided that the 
continuity conditions of Lemma 4 are satisfied. A test, for testing a given hypoth- 
esis, is said to be consistent with respect to a certain alternative, if, whenever 
the alternative is true, the power of the test tends to unity as sample size tends 
to infinity. 

As an example, let us find confidence upper bounds for é.. — &; with confidence 
coefficients at least .95, using a random sample of size 50. It is easy to see that 
L = 1 — a of (5) is equivalent to 


(17) B,(i, p) + Ba(j, 1 — 9) S$ a, 


where i = r — 1 andj = n — 8; and s — r is minimized if i + j is maximized. 
Now n = 50, p = 3,q = 6,1 —q = 4, and a = .05. The largest integers 7 
and j for which By (7, .3) and Bs (j, .4) S .05 are 9 and 13. For n = 50, and 
i = 9, 8 respectively, the largest integers j’s for which (17) holds are 11 and 13. 
Therefore i + j is maximized if i = 8 andj = 13. Hence r; = 9 and s,; = 37. 
Further, if ¢ = 28, then by (7), r = 10, s = 38,7 = 9, andj = 12 for which 
(17) does not hold. But if ¢ = 29, then r = 10,8 = 39,7 = 9,7 11, and (17) 
holds. Hence rz = 10 and 8s; = 39. 

B. A special case. To make inferences about & — &, when g = 1 — p, it seems 
most natural to use symmetric quasi-ranges (%»_,41 — 2,-). If q l-—p,s= 
n—r+1,andv = n — u +1, then (5) becomes 


(18) B,(r — 1, p) S a/2, B,(u — 1, p) 2 1 — a@/2. 





J. T. CHU 


TABLE 1 
Values of r’ and w’ 


p= 25andq = .75 


a 


.05 





8 
11 
14 
17 
20 
23 
25 


28 


Iori - te 


oe 
poe © 


Let r’ and w’ be respectively the largest and smallest integers r and u for which 
(18) holds. Let s’ = n — r’ + landw’ = n—-w-+1.Theng, —2z ,x% — 
ty, and (x, — 2, t — 2) are confidence upper and lower bounds, and 
interval for &_, — & with confidence coefficients 1 — a, 1 — a, and 1 — 2a. 
To find r’ and wu’, we may use [6] and [8]. Table 1 below is obtained in this way. 
There p = .25 and q = .75. If, for example, n = 30 and a = .05, then r’ = 3 
and n — r’ + 1 = 28. Therefore P(xog — 23 = &7 — £5) = .95. Likewise, 
P(aig — X13 S &.75 — £05 S Xp — 23) = .90. 

A question then follows. Are the quasi-ranges, obtained by applying the 
general methods (in Section 4A) to this particular case (q = 1 — p), symmetric 
and identical to the corresponding ones obtained by the methods just described? 
More precisely, if g = 1 — p, are the integers r; , s; , wu; , and v;,7 = 1, 2, (defined 
in Lemma 3) equal respectively to r’, s’, u’, and v’ (defined by (18) with the 
same n, a, and p)? We have the following answers. (i) r’ = re or re — 1, 8’ = 
82, u’ = U2, and v’ = v2 or v2 — 1. In other words, z,, — x,, and 2z,, — 2, are 
either identical to z,, — x, and 2, — x, , or only slightly different from them. 
(ii) Generally, no similar relations exist between r’ and r; , etc. In the first place, 
the integers 7; , s:, % and », are not always uniquely determined. Sometimes 
none of the corresponding quasi-ranges is symmetric or nearly so. (For example, 
ifn = 100, p = .1,q = .9, and a = .99, then, following (17), 7; = 4, 5, 6, 15, 
16, 17; and s, = 84, 85, 86, 95, 96, 97.) If, however, a < .5 and n is sufficiently 
large, then it can be shown that one set of r; , --- , and v; coincides with rs, --- , 
and v2. Therefore for a < .5 and large n, at least one z,, — z,, and one z,, — 
ty, are either identical to x,, — x, and x, — 2, or only slightly different from 
them. 

We shall now prove (i) and (ii). If ¢g = 1 — p, then from (7), r = [(n — t)/2] + 

2 
ov 


1. If, corresponding to given n and a, (n — t2)/2, where t. is defined in Lemma 


; 





USES OF QUASI-RANGES 179 


is not an integer, then s. = n — 7. + 1. Otherwise s. = n — (rz —1) + 1. In 
the first case, we see that (18) holds for r = r. but not for r = r2 + 1. (This is 
because (5), L 2 1 — a, holds fort = &,r = r2, ands = s; while L < 1 — 
a holds fort = & — lj r = r, + 1, and s = s,.) Hence r’ = rz — 1. In the 
second case, (18) holds for r = 7, — 1, but not for r = r.. Hence r’ = rp — 1. 
Furthermore, if a < .5 and n is sufficiently large, then no 7 and j satisfying (17) 
can be greater than np. Using the fact that (7) p'(1 — p)"~ is an increasing 
function of 7 for allO < i < (n + 1) Pp, it is easy to see that one of those pairs 
of i and j for which (17) holds and 7 + 7 is maximized must be such that i = 7 
or i = j + 1. The integers r,; and 8; , corresponding to this pair of 7 and j, are 
equal respectively to r, ands. (mn =i+ 15 =n—rm+lorn—n+ 2. 
Let t = 8s, — r, in (7), then r = r, and s = s,. Hence & = 3 —n,72 =n, 
and s; = 8.) Finally, in a similar way, we show the statements concerning 
uz, v0;, Ww, and v’. 

C. The standard deviation. Let f(x) be a pdf of the form (1/b) fo ((a2 — a)/b), 
where —*2 < x < »*, and aand b > O are the parameters. If m, mo, o , and 
oo are respectively the means and variances of f(x) and f(x), then m = a + 
bmp and o° = b’o,. If &, and &, are the p-quantiles of f(x) and fo(x), then —, = 
a + bé', . It follows that &, — & = b( — £4) and 


(19) o = (& — &)/eo, 


where c = (£2 — &)/oo depends on p, q, and fo(x), but not on a and b. Therefore 
for given p, qg, and fo(x), any inference about £ — &, can be readily transformed 
into one about o. Thus if z, — z, is a confidence bound for &, — &, , then (x, — z,), 
co is a confidence bound for o. 

For each of the following types of distribution, Table 2 gives their standard 


deviations, the ¢o’s defined by (19), and the values of the co’s corresponding to 
p = .25 and q = .75. The pdfs f(x) are 


(20) Normal: (1/o ~/2x) exp [—(x — pn)’, 2a°|, —xo <zr< @w; 
(21) Laplace: (1/2A) exp (— | zx — yw | /A), <zr< @; 
(22) Triangular: (1/A) {lL —|2 — ww] /Al, r; 

(23) Rectangular: 1/2h, x—a\|sh; 


(24) Exponential: (1/A) exp [—(a# — u)/Al, 


TABLE 2 





is) 


(1 


2)(+log pi — +log qi] 
6(+(1 — Vm) — +0 — Vp)! 
V3(q — P) 
g(l—p)/Q—-q) 


Vv 
2 
lo 





180 J. T. CHU 


For the normal distribution of (20), co is not explicitly given in terms of p 
and q, but £} (and similarly £°) satisfies (£3) = p, where ® is given by (12), and 
can be found by a normal probability table. Further, in the formulas for the 
¢o’s corresponding to Laplace and triangular distributions; p, (and similarly q:) 
is defined to be 1 — | 1 — 2p| ; and the + or — sign associated with p, should 
be used according as p = 3 or S }. Finally, as an example, let a sample of size 
50 be drawn from the exponential distribution of (24). Let p = .25, q = .75, 
and a = .025. From Table 1, r’ = 6 and w’ = 21. From Table 2, the standard 
deviation is \ and c) = 1.10. Therefore ((x30 — 2)/1.10, (a4 — 2¢)/1.10) is a 
confidence interval for \ with confidence coefficient at least .95. 


REFERENCES 


[1] J. H. Capwe t, ‘‘The distribution of quasi-ranges in samples from a normal popula- 
tion,’’ Ann. Math. Stat., Vol. 24 (1953), pp. 603-613. 

[2] H. Cramétr, Mathematical Methods of Statistics, Princeton University Press, 1946. 

[3] W. Fevuer, An Introduction to Probability Theory and Its Applications, Vol. 1, John 
Wiley & Sons, 1951. 

[4] P. Lévy, Calcul des Probabilités, Gauthiers-Villars, Paris, 1955. 

[5] F. Moste.uer, ‘On some useful ‘inefficient’ statistics.’”” Ann. Math. Stat., Vol. 17 
(1946), pp. 377-408. 

[6] H. G. Romie, 50-100 Binomial Tables, John Wiley & Sons, 1953. 

(7) S.S. Wrxxs, “Order Statistics,’’ Bull. Amer. Math. Soc., Vol. 54 (1948), pp. 6-50. 

[8] Tables of the Binomial Probability Distribution, National Bureau of Standards, Wash- 
ington, D. C., 1950. 





MODIFIED RANDOMIZATION TESTS FOR NONPARAMETRIC 
HYPOTHESES' 


By Meyer Dwass 
Northwestern University and Stanford University 


1. Introduction and summary. Suppose X,,---, Xm, Yi,--+, Ya are 
m + n = N independent random variables, the X’s identically distributed and 
the Y’s identically distributed, each with a continuous cdf. Let 


Z = (21, -°+ Sm, Zmgry °° * » Su) = (Lr, °°* ym, Yay °** » Yo) 


represent an observation on the N random variables and let 
m N 
u(z) = (1/m) 2: — (1/n) DS a = F-F. 
t=] t=—m+1 


Consider the r = N! N-tuples obtained from (z, , --- , zy) by making all permu- 
tations of the indices (1, --- , N). Since we assume continuous cdf’s, then with 
probability one, these r N-tuples will be distinct. Denote them by e 


z’”’, and suppose that they have been ordered so that 


oe 


u(z”) > --- = ule”). 


Notice that since 


N N 
= — 9 = (1/m) dz — (N/m)g = (N/n)zé — (1/n) dz, 


the same ordering can be induced by choosing u(z) = c# or u(z) = — c% for 
any c > 0. 

Assuming that the cdf’s of X, , Y; are of the form F(x), F(a — A) respectively, 
Pitman [2] suggested essentially the following test of the hypothesis H’ that 
A = 0. Select a set of k (kK > 0) integers7,,---,%,1Sai<-+:-<% Zr). 
If the observed z is one of the points z“”, --- , 2”, reject H’, otherwise accept. 
When H’ is true, the type one error does not depend on the specific form of the 
distribution of the X’s and the Y’s and is in fact equal to k/r. The choice of the 
rejection set 7; , --- , 7 should depend on the alternative hypothesis. For instance, 
if the experimenter wants protection against the alternative that the “X’s tend 
to be larger than the Y’s,”’ then the labels 1, --- , k might be reasonable. For 
the alternative that the ‘“X’s tend to be smaller than the Y’s” the analogous 
procedure is to use the other tail, r — k + 1, --- , r. Against both alternatives, 


Received, January 31, 1956. 
1 This work was supported in part by an Office of Naval Research contract at Stanford 
University. 


181 





MEYER DWASS 


TABLE 1 


Under each a heading, the left-hand column is computed from (8) and the right- 
hand column from a normal approximation. Computations were made only 
for those values of s such that d + 1 = a (s + 1) is an integer. 


aA(a) 


.827 881 

.842 ‘ 891 

.902 

125 a 877 915 
.774 r .900 .931 
.824 .876 .922 .946 
.875 .912 .945 .962 


a two-tail procedure could be used. Lehmann and Stein have shown in [1] that 
in the class of all tests (of size a = k/r) of the hypothesis 


H: the distribution of X,---, Xm, Y:,-+--, Y, is invariant 
under all permutations, 


the single-tail test based on 1, --- , k is uniformly most powerful against the 
alternatives that F; is an N(6, c) cdf, F; is an N(@ + A, oc) cdf, A < 0; the test 
based on r — k + 1, ---, 7 is uniformly most powerful for A > 0. 

A practical shortcoming of this procedure is the great difficulty in enumerating 
the points 2” and the evaluation of u(z2°”) for each of them. For instance, even 
after eliminating t>ose permutations which always give the same value of wu, 


' ‘ 10 oan ' 
then for sample sizes m = n = 5, there are( _ ) = 252 permutations to examine, 


oY 


: 20 as : 
and for sample sizes m = n = 10, there are 19) > 184,765 permutations to 


examine. In the following section, we propose the almost obvious procedure of 
examining a “random sample” of permutations and making the decision to 
accept or reject H on the basis of those permutations only. Bounds are deter- 
mined for the ratio of the power of the original procedure to the modified one. 
Some numerical values of these bounds are given in Table 1. The bounds there 
listed correspond to tests which in both original and modified form have size a, 
and for which the modified test is based on a random sample of s permutations 
drawn with replacement. These have been computed for a certain class of alterna- 





MODIFIED RANDOMIZATION TESTS 183 


tives which is described below. For simplicity, we have restricted the main 
exposition to the two-sample problem. In Section 5, we point out extensions to 
the more general hypotheses of invariance studied in [1]. 


2. Description of modified procedure. We first make some definitions. For 
any z = (z%,°-:, 2w), let T(z) be the set of all points obtained from z by per- 
muting its coordinates. With probability one, all sets T(z) contain r = N! 
points z"’, --- , z“” and we restrict our discussion to such sets. We also suppose 
that they are ordered in the manner described earlier. Define R“” to be the 
union over all sets T(z) of the points 2, (i = 1, --- , r). Evidently R™, --- , 
R” are disjoint sets whose union is the whole sample space except for a set of 
probability zero. Let P(i) = P(R™). Restricting ourselves to the case A < 0, 
we describe the Pitman procedure given above in terms of a test ¢, as follows: 


(1 if u(z) = u(e™), 
lO if u(z) < u(z)™), 


where z’’, --- , 2” are the points of 7'(z) and ¢(z) is the probability with which 
H is rejected when z is observed. Let r~ = r*(z) be the number of z’” in T(z) such 
that u(z’) = u(z). (Notice that R™ is the event that r*(z) = i.) Then the test 
described is equivalent to rejecting H when r° < k and accepting otherwise. 

The modified procedure will be to make this decision on the basis of examining 
a random subset of 7'(z). Specifically we describe a modified test ¢, as follows: 
Select at random s (s < r) points of T(z). For simplicity, we suppose the sam- 
pling from T(z) is done with replacement. Let r’~ equal the number of the s 
points for which u(z"’) = u(z). Then we define 


~ 2 meee 
oe . = } ; . 
(0 if r’* >d, 
where d(0 < d & s) is a predetermined integer. We point out that g, is a ran- 
domized test and that r’~ depends not only on z but on the s points of T(z) se- 
lected. Let 


v(t) = £ (‘) “1 — 


i=0 


The following is easily verified. 
PROPOSITION 1. 


r k 
(1) Eg. = 2, W(i/r)P(i), Ee =  P(i). 
i=l i=l 
REMARK. In particular, when H is true, then P(i) = 1/r and for large r, 
Ex¢g, is approximately equal to 
al 


(2) W(t) dt. 


" 





184 MEYER DWASS 


In what follows, we always assume that 
Ex¢. = Exg = k lr. 


Notice that ¥(t) is a nonincreasing function in (0, 1). This fact is used in de- 
riving the following bounds in Propositions 2 and 3. 
PROPOSITION 2. 


(3) Ee, = V(a)Ee, (a = k/r). 


The above bound is quite weak. On the other hand, equality in (3) is attained 
only when P(z) = 0, (i + k), P(k) = 1. It would not be unreasonable to say that 
the alternatives against which ¢ can be expected to be effective are those satis- 
fying 


(4) P(1) = P(2) 2 --- = P(r). 


In particular, (4) is satisfied when the P(z) are the probabilities induced by any 

simple alternative against which ¢ is most powerful for all 0 < a < 1. According 

to [1], this is true for the normal alternatives described in the introduction, uni- 

formly for A < 0. Hence, we shall next determine the value of inf Ey, / Eg over 

all P(1), --- , P(r) satisfying (4), and such that ¢, ¢, have size a = k/r. 
PROPOSITION 3. Suppose (4) is satisfied. Then 


k 
6)) Eg, = k* D) W(i/r)Ee. 
i=l] 
Poor. 


Bex / Ee = © wif) P@ / YP) 


= DvinPa/E P+ DY weno /L Po). 
t=1 j=l tmk+1 j=l 


Hence, by replacing P(z) with P(z) / ns P(j) for i = 1,---, k, and with 0 
fori = k + 1,---, 7, we do not increase the value of Ey, / Ey and we may as 
well assume at the outset that P(k + 1) = --- = P(r) = 0. Now by the mono- 
tonicity of V, it is easy to see that subject to (4), toa W(i/r)P(i) / > jad P(j) 
is minimized when P(1) = --- = P(k) = 1/k, which completes the proof. 
REMARKS. 
(a) It is evident from the proof that (5) holds if (4) is replaced by 


(4’) P(1) 2 --- 2 P(k). 


(b) By (5), Ee: / Ee = (r/k) int W(i/r)/r. For large r, > cmt W(i/r)/r is 
approximately equal to fo W(t) dt; hence inf Ey, / Eg over all P(z) satisfying 
(4’) approximately equals 


(6) a’ [ W(t) dt. 





MODIFIED RANDOMIZATION TESTS 185 


Let Bis, t} denote the number of successes in s independent binomial trials with 
t the probability of success in each. Then 
' 


> eitl< = = = sitemeter 
P(Bis, t} S d) ¥()) l (s —d — 1)!d!. 


t 
[ u'(1 — uj) *" du. 
0 


Let A(t) = fo ¥(u) du. After integration by parts, we have 


a aoe ~t 
A(t) = t¥(t) + (° 4 ft — uw) du 


(7) d Jo 


= H(t) + (d+ 1)/(s + IP(Bls + 1,4 = d + 2). 


By (2), Exg = Exug. = k/r is approximately equal to A(1) = (d + 1) / (s + 1). 
Suppose d and s are chosen so that (d + 1) / (s + 1) = k/r = a. Then by (7), 
the value of (6) is 


(8) a 'A(a) = P(Bis, a] S d) + P(Bls + 1, a] = d + 2). 
Some values of a 'A(q) are given in Table 1. 
3. Concluding remarks for the two-sample problem. 
(a) The main point is that instead of basing our decision on F’ a ") per- 


mutations of the observations, we car base it on a smaller number of permuta- 
tions and the power of the modified test will be “close” to that of the most 
powerful nonparametric test. It may be argued that s still has to be ridiculously 
large. For instance, if a = .05, s = 10°, then (8) equals .945; and if a = .05, 
s = 10°, then (8) equals .98. However, the optimum test is usually completely 
impossible. For instance, if m = 20, n = 20, then - Me a > 10”, and if there 


were a machine that could check 10 permutations a second, the job would run 
something on the order of 1000 years. The point is, then, that an impossible test 
can be made at least possible, if not always practical. 

(b) For some alternatives, the efficiency of the modified test may be better 
than the bound in (8) would indicate, since we would often expect a strictly de- 
creasing sequence in (4). 

(c) For moderate size s there may be reasonable hand-computing procedures. 
A possibility is the following: Enter each of the m + n observations on a sepa- 
rate card. Perform s “random shufflings.”’ For each shuffle, sum the first m 
entries and record. 

(d) An open problem which may be worth investigating, at least empirically, 
is the following: For what value of s is the modified test already better than some 
given rank order test, or in particular, than the rank order test which is best 
against the alternative under consideration? 


4. Generalizations. Lehmann and Stein [1] have studied randomization tests 
in a general framework. We do not describe here the most general setup, but 
rather one to which the results of the earlier sections are adaptable. Suppose 





186 MEYER DWASS 


(Z,,°-:,Zw) = Zare N random variables and there is a partition of the sample 


space of points (2, , --- , zw) = z into classes of equivalent points. For instance, 
in the Pitman example, two points are equivalent if the coordinates of one can 
be obtained from the other by a permutation. For simplicity, we suppose that 
with probability one, each equivalence class 7(z) contains a finite number, r, 
of points. Let H be the hypothesis that the distribution of (Z,, --- , Zw) Z 
is, for any 2, invariant over all the points of T(z). (This is stated here in a some- 
what unrigorous way. For the correct statement and for the necessary measura- 
bility assumptions, see [1].) A test of H is a function ¢ which assigns to each 
point z a number, ¢(z) between zero and one representing the probability of re- 
jecting H when z is observed. If 


> 2’) = ar 


z’eT (z) 


identically in z, then ¢ is a similar size-a test for testing H. Lehmann and Stein 
have shown in [1] that under quite general circumstances, a most powerful and 
similar size-a test of H against a simple alternative is given by ordering the 
points of T(z), so that 


u(z”) = --- = u(z"”) 


and setting 


_tlar] 


if u(z) ulz ), 


(1+[ar] 


if u(z) u(z i‘ 


); 


if u(z) < u(z"*tor? 
where u is an appropriately chosen function and a = a(z) is uniquely determined 
to provide a size-a test. We assume that the random variable u(Z) has a con- 
tinuous cdf and that the size of the test is k/r where k is an integer (1 S k S r). 
The effect of this assumption is to eliminate ties and to provide a nonrandomized 
test with probability one. We can now describe a modified test procedure exactly 
as was done in the two-sample case above. There is no reason to suppose that s 
items are to be selected at random and with replacement from the set 7'(z) 
when z is observed, however. Any “lot acceptance” plan for deciding whether 
or not r'(z) S k would be appropriate; for instance, the elements of T(z) can 
be selected without replacement or sequentially, etc. Let 


W(t) = P{deciding r* = k|r‘/r = t}. 


Notice that this coincides with the definition in the special case studied pre- 
viously. Now Proposition 1 goes through in exactly the same way as before. If 
W(u) is a nonincreasing function in (0, 1) (which is a negligible restriction), then 
Propositions 2 and 3 also go through exactly as before. It is also true that for 
large r, fo V(t) dt is practically equal to a and that the lower bound on the effi- 
ciency of g, versus ¢, under the condition (4’) is practically equal to a fx W(t) dt, 





MODIFIED RANDOMIZATION TESTS 187 


but in general these quantities may be more difficult to compute than they were 


for the earlier special case. 


REFERENCES 
[i] E. L. Leumann anv C. Srein, ‘‘On the theory of some nonparametric hypotheses,” 
Ann. Math. Stat., Vol. 20 (1949), pp. 28-45. 
[2] E. J. G. Prrman, ‘Significance tests which may be applied to samples from any popu- 
lations,”’ J. Roy. Stat. Soc., Vol. 4 (1937a), pp. 119-130. 





ON CERTAIN TWO-SAMPLE NONPARAMETRIC TESTS 
FOR VARIANCES! 


By BaALKRISHNA V. SUKHATME 


Indian Council of Agricultural Research, New Delhi 


Introduction. Let X,, X:, ---, X, and Y;, Yo, --:, Y, be two samples 
of independent observations drawn from two populations with cumulative 
distribution functions F(x) and G(x), respectively. We will assume in what 
follows that F and G are absolutely continuous and that they are the same 
in all respects except that they differ in the scale parameter. The problem con- 
sidered here is that of testing the hypothesis 


H:F = G, 
A:F # G. 


If the X’s and the Y’s come from normal populations, the usual test of signif- 
icance for testing the hypothesis H is the variance ratio F-test, which is the 
most commonly used statistical test for comparing variances. Usually however, 
since little is known about the populations from which the samples are drawn, 
this test is used as if the assumption of normality could be ignored. It appears, 
however, that such is not the case. This was first pointed out by E. 8. Pearson 
[1], who conducted certain experimental investigations. His findings were later 
confirmed by several other authors, especially by Geary [2] and Gayen [3]. 
They showed that the F-test is particularly sensitive to changes in kurtosis 
from the normal theory value of zero. Now, it is easy to see that the F statis- 
tic, when suitably normalised, is asymptotically distribution free. More re- 
cently, Box and Andersen [4] and [5] have studied this problem in great detail 
and have shown on the basis of extensive sampling experiments that the F 
statistic so normalised is insensitive to departures from normality, at least for 
large samples. Very recently attempts have also been made to construct non- 
parametric tests, particularly by Mood [6] and Lehmann [7]. 

The test proposed by Mood is similar to the variance ratio F-test with ranks 
replacing the original observations. He has also computed the asymptotic rela- 
tive efficiency of the test with respect to the F-test for normal alternatives. In 
this paper, we will derive a general formula for the asymptotic relative effi- 
ciency of Mood’s test with respect to the F-test for scalar alternatives but al- 
most arbitrary continuous distributions. 

The test proposed by Lehmann is essentially of the Wilcoxon-Mann-Whitney 
type (see [8] and [9]) applied to all possible differences between the X’s and 

Received January 29, 1956. 

1 This research was performed while the author was at the Statistical Laboratory, Uni- 


versity of California, Berkeley, and was supported by the Office of Ordnance Research, 
United States Army, under Contract DA-04-200-ORD-171. 


188 





TWO-SAMPLE NONPARAMETRIC TESTS 189 


the Y’s. As pointed out by Mood, the test is not distribution free. However, he 
computed the asymptotic relative efficiency of Lehmann’s test with respect to 
the F-test on the assumption that the asymptotic variance of the test statistic 
is distribution free. It will be shown in this paper that the asymptotic variance 
of the test statistic suggested by Lehmann is not independent of the form of 
the distribution, even under the null hypothesis F = G. 

Lastly, we will propose a new nonparametric test for comparing variances 
and obtain a general formula for its asymptotic relative efficiency with respect 


to the F-test for scalar alternatives but almost arbitrary continuous distribu- 
tions. 


1. Mood’s test. This is a dispersion test based on the statistic 


i=l 2 


where r; is the rank of Y; in the combined sample of m + n observations. We 
reject the hypothesis if M is too large. Then, as shown by Mood, under the 


null hypothesis 
nis + 1)(s i) 


12 


(1.2) E(M) = 


—" 
(1.3) var (M) = male + 1s + 2)(s — 2) + ass 2)(s =. 


and under the alternative 
E(M) = a (3(s + 1)? — 6(n + 1)(s + 1) + 2(n + 1)(2n + 1)} 


(1.4) 


— mn {2m — 1) [ FGF + (n — 1) | Gar — (s — 2) [ GaP}, 


where, for short, we write s for m + n. 
Let G(z) = F(x6). Then, proceeding as in [6], 


(15) OF oa 7 726-24 f aP@Mle) aP (a) — 5 | afe) aPC}. 


The efficacy of the M-test is therefore equal to 


180(s — 2)*<2 / aF(zx)f*(x) dx — [ #@ dz\ 
mn(s + 1)(s + 2)(s — 2) ' 


Also, the efficacy of the variance ratio F-test is 


(1.6) 


(17 4mn 
rh (m + n)(B: — 1)’ 





190 BALKRISHNA V. SUKHATME 


Hence, the asymptotic relative efficiency of the M-test with respect to the F- 
test is given by 


(1.8) €y = 45(B. — 1) {2 [ 2F(2)f*(2) dz — [xo dx} 


> 


where 


| (e — EX)‘ dF(2) 
Bo => 


d / (x — EX) dF(x)$ 


From the formula (1.8), it is obvious that depending on f (x), 0 
Thus, considering 


ed. 


(1.9) fz) => - 


2a 


we find after some computations that 


(1.10) ae ee 
(9 — a) 

which tends to zero as a tends to unity. Thus the asymptotic relative efficiency 
of the M-test with respect to the F-test can be made as small as we please. 
Similarly, taking f(x) to be a Pearson Type VII density function, it can be 
shown that the asymptotic efficiency can be made as large as we please. In 
particular, if f(x) is the standard normal density function with mean zero and 
variance unity, ¢, = 0.76. If f(x) is equal to one on the unit interval about 
the origin and zero otherwise, then, ey = 1. 


tr ° ° . m o,@ oe 
2. Lehmann’s test. The test consists in forming all the (7) positive differ- 


r9 n one “oe ry 
ences between the m X’s and the (5) positive differences between the n Y’s. 


The test is then based on the statistic 


—1 —1 
(2.1) _ Fe} (”) > o(| Xs — Xs), | ¥ — Y2)), 


- i<.j 


k<l 


where 
g(u,v) = 1 


= 0 otherwise. 

Clearly, this is a generalised U statistic in the sense of Lehmann [10] and 
hence it follows that ifm = Np and n = Ng, with p+ g = 1, VN(L — EL) 
is asymptotically normally distributed with asymptotic variance o given by 





TWO-SAMPLE NONPARAMETRIC TESTS 
(2.2) 2 4 {fm 4 Sh 
Snes 
where 
Ele(| X: — X2|,| V1 — Ye \)e(| X1 — X3!|,|) Ys — Yol)] 
oe E’e(| Xi — X2|,| Yi — Yo )) 


Ely(| Xi — X2|,| ¥i — Yo |o(| Xs — Xal,| Yi — ¥s))I 
— E’g(| X: — X2|,| ¥: — Ye). 
— 
(2.5) Ee(\ Xi — X2|,| Yi — Yel) = PUM — Xl SK. - Yel) = 3 


under hypothesis. 


To compute fi and &, we first compute the following, all under the hy- 
pothesis F = G. Let 


X; — X» _ U1, Xi X | U2, Y; _ Y2 = Vi, Y; aes Y, | = Vs. 
Then, 


(2.6) K(v,) = v;) = [ re + v,) — F(x — »,)| dF(z). 


Also, we have 


H (uy, , us) P(U, S uw, Us S Us) 
(2.7) 


= | [F(x + uw) — F(x — u,)|[F(a@ + uw) — F(x — w)| dF(z). 


Then, we see that 


2=V2)—-t= I H(v; , v2) k(v:)k(v2) dv; doz — 4. 


Exactly, in the same manner, we find that 
(2.9) ix = |f K@)KG)Mh, t) dt, dte — 4, 
where 
h(t,,4) = / [f(z + t) + f(z — t\If@ + tb) — fe — &)) dF(). 


Taking 
f(x) -$s2 ; 


otherwise, 





192 BALKRISHNA V. SUKHATME 


it turns out that 
(2.10) °/N = 1/180-(1/m + 1/n). 
On the other hand, if 
= 4. 
107 


26-38 


(2.11) 
- (1/m + 1/n). 


It follows that the asymptotic variance depends essentially on the form of the 
distribution function F(x), even when F = G. Hence, the test based on the 
statistic L is not asymptotically distribution free. 

From the above results, it seems that Mood’s test is reasonably efficient 
for normal alternatives and highly efficient for some non-normal alternatives. 
The test, however, presupposes knowledge about the relative location of the 
two populations, which is not always present. If it is not, the test can be modi- 
fied by applying the test to the deviations from the sample medians rather 
than to the observations themselves. The modified test is essentially the same 
test, and we would expect the modified test to behave nicely, at least for large 
samples. It will be shown in another paper that the modified test is not asymp- 
totically distribution free in the sense that the asymptotic distribution of 
the test statistic is not independent of the original population from which the 
samples are drawn under the null hypothesis. In the next section, we therefore 
propose another nonparametric test for comparing variances, especially con- 
structed with this object in view. As will be seen in the next section, the test 
is not so efficient as the one proposed by Mood. This test also assumes know!}- 
edge about the relative location of the two populations. It will be shown in 
another paper that under certain regularity conditions, the proposed test after 
modification is asymptotically distribution free. 


3. The proposed 7’-test. The test statistic may be defined as 


(3.1) r= = + Fee, ¥), 


MN iml j=l 


where 


» ahiee .. j,either 0< X < Y 
Hx, FY) = 1 adie Y<X <Q, 


= 0 otherwise. 


We reject the hypothesis if T is either too large or too small. We shall now 


find the mean and the variance of 7 both under the hypothesis and the alterna- 
tive. 





TWO-SAMPLE NONPARAMETRIC TESTS 


Mean and variance of T under the hypothesis. 
3.2) E(T) = EW(X;, Y;) 
PO < X; < Y;) + P(Y; < X; < 0) 


a2 
4° 


Squaring and taking expectations and noting that 
EWW(Xi, YiW(Xi, Ve)) = EW(X:, VjW(Xn, Ys) = x4, 
it follows easily that 


m+n+7 


(3.3 = 
(3.3) var (T) — 


Mean and variance of T under the alternative. 
ao 0 
(3.4) E(T) = (1 — G)dF + [ G dF. 
“0 — 0 


To find the variance under the alternative, it is easily seen that 

(35) EWX.,¥WxX., Y= [| a-oetar+[ car, 
“0 — eo 

(3.6) EWX«, ¥)x(%,¥)) = | Pag — f Pag +3. 

Whence, we find that 


var (T) = 1 mn | | F dG - [ F dG + (n — 1) 


“0 


~ -0 ° 
(3.7) | (1 — G) dF + I G ar |+ (m — v| fr dG — | F dG + i 
“0 — 3 


f ne 0 \? 
—(m+n-1)4/ Fag — [ F dG | 
\“0 ee } 


which tends to zero as m and n tend to infinity. Thus, 


a | F dG - | FdG as m and n —— oo, 
Hence, the test is consistent. 

Asymptotic efficiency of the T test. We observe that T is a modified form of the 
Wilcoxon-Mann-Whitney statistic. Mann and Whitney proved the asymptotic 
normality of the Wilcoxon statistic under the hypothesis and Lehmann proved 
it under the alternative. Using these results, it follows easily that T is asymp- 
totically normally distributed both under the hypothesis and the alternative. 
It can also be verified that all the conditions of Pitman’s [11] theorem are satis- 





194 BALKRISHNA V. SUKHATME 


fied. We are therefore ready to compute the asymptotic relative efficiency of 
the T-test with respect to the variance ratio F-test. We have 

af'(x) dx — | af'(x) dx. 
I 


! 


0 


dE(T) r 
de 6=1 | 


Efficacy of the T-test is therefore equal to 


(3.8) 48mm/(m + n + 7) | af’ (x) dx — af (x) ar | . 

“0 —-@ 
Whence, we find as before that the asymptotic relative efficiency of the 7-test 
is equal to 


3.9) er = 12(8. — 1) | af’ (x) dx — | af (x) ic | ‘ 
Jo -« 


It can be demonstrated as before that the efficiency can be anything from zero 
to infinity. In particular, if f(a) is the standard normal density function, er = 
0.61. If (2) = 4-e7"*! er = 0.94. 

4. Acknowledgment. The author wishes to express his thanks to Professor 
Erich Lehmann for proposing this investigation and for his interest and helpful 
suggestions during its progress. 


REFERENCES 


{1] E. 8. Pearson, ‘“The analysis of variance in cases of non-normal variation,’’ Bio 
metrika, Vol. 23 (1931), pp. 114-134. 
{2] R. Geary, ‘‘Testing for normality,’”’ Biometrika, Vol. 34 (1947), pp. 209-241. 
[3] A. K. Gayen, ‘‘The distribution of the variance ratio in random samples of any size 
drawn from non-normal universes,’’ Biometrika, Vol. 37 (1950), pp. 236-255. 
[4] G. E. P. Box, ‘‘Non-normality and tests on variances,’’ Biometrika, Vol. 40 (1953), 
pp. 318-335. 
[5] G. E. P. Box anp 8. L. ANpERsoN, ‘‘Permutation theory in the derivation of robust 
criteria and the study of departures from assumption,”’ J. R. Statist. Soc., B, 
Vol. 17 (1955), pp. 1-34. 
. Moon, ‘‘On the asymptotic efficiency of certain nonparametric tests,’’ Ann. Math. 
Stat., Vol. 25 (1954), pp. 514-522. 
). L. Leumann, ‘‘Consistency and unbiasedness of certain nonparametric tests,’’ 
Ann. Math. Stat., Vol. 22 (1951), pp. 165-179. 
*, WiLcoxon, ‘‘Individual comparison by ranking methods,’’ Biometrics, Val. 1 (1945), 
pp. 80-83. 
. B. Mann anv D. R. Wuitney, ‘‘On a test of whether one of two random variables 
is stochastically larger than the other,’’ Ann. Math. Stat., Vol. 18 (1947), pp. 
50-60. 
). L. LEHMANN, unpublished. 
x. E, Noeruer, “Asymptotic properties of the Wald-Wolfowitz test of randomness,”’ 
Ann. Math. Stat., Vol. 21 (1950), pp. 231-246. 





MULTI-FACTOR EXPERIMENTAL DESIGNS FOR EXPLORING 
RESPONSE SURFACES'! 


By G. E. P. Box? ann J. S. Hunter? 


Summary. Suppose that a relationship 7 = g(é , , --- , &) exists between a 
response n and the levels & , & , --- , & of k quantitative variables or factors, and 
that nothing is assumed about the function ¢ cxcept that, within a limited re- 
gion of immediate interest in the space of the variables, it can be adequately rep- 
resented by a polynomial of degree d. 

A k-dimensional experimental design of order d is a set of N points in the k- 
dimensional space of the variables so chosen that, using the data generated by 
making one observation at each of the points, all the coefficients in the dth 
degree polynomial can be estimated. 

The problem of selecting practically useful designs is discussed, and in this 
connection the concept of the variance function for an experimental design is 
introduced. Reasons are advanced for preferring designs having a ‘‘spherical’’ 
or nearly ‘“‘spherical’’ variance function. Such designs insure that the estimated 
response has a constant variance at all points which are the same distance from 
the center of the design. Designs having this property are called rotatable designs. 
When such arrangements are submitted to rotation about the fixed center, the 
variances and covariances of the estimated coefficients in the fitted series remain 
constant. 

Rotatable designs having satisfactory variance functions are given for d = 1, 


2; and k = 2,3,--- , ©. Blocking arrangements are derived. The simplification 
the form of the confidence region for a stationary point resulting from the use 
of a second order rotatable design is discussed. 


1. Introduction. Suppose we have k variables or factors whose levels are denoted 
by & , &,+--, & on which depend the level of some response 7 in accordance 
with an unknown relationship 


L) n = o(i, &,-+-, &). 


Suppose that in order to explore this relationship, NV experiments are performed. 
The uth of these experiments consists in adjusting the factor levels to a certain 
set of k predecided values, &, , fu,--:, &. and of observing a response y,, . 
The problem of experimental design discussed is that of choosing the N sets of 
levels at which observations are to be made. It is often convenient to view the 
problem geometrically and to regard Eq. (1) as defining a surface referred to as 
the response surface. The N sets of conditions at which the response is observed 


teceived June 21, 1955; revised November 26, 1956. 
1 Prepared under the Office of Ordnance Research, Contract No. DA-36-034-ORD-1177 
at the Institute of Statistics, Raleigh N.C. 
2 Now with the Statistical Techniques Group, Princeton University. 
3’ Now with the American Cyanamid Company, New York. 
195 





196 G. E. P. BOX AND J. S. HUNTER 


will then correspond to N points in the space of the variables called experimental 
points. 

1.1 Notation. Following the convention adopted in previous papers [1], [2] 
we shall define a set of standardized levels 


: : 
(2) x ne where 


For these standardized levels therefore 


N 
(3) 7 Lin = 0 and 


u=1 


and for the time being the convention is adopted that c = 1. 

We shall denote by D the N X k design matrix which provides a program of the 
N experiments to be performed. The elements of the wth row of this matrix are 
the values of the standardized levels x, , te. ,-*- , Le, to be used in the uth 
experiment. These elements also define the uth experimental point in the k- 
dimensional space of the variables. Since the designs we consider may include 
many factors, they will be called multi-factor designs. 

Using standardized factor levels in accordance with Eq. (3) we can prepare 
standard design matrices appropriate for various values of k, and for various 
types of assumptions concerning the function ¢. In given circumstances the 
experimenter can select the appropriate design matrix and choose suitable 
average values & , &, --- , & and units S, , S;, --- , S, so that the design covers 
the region of immediate interest in the space of the variables. The level of the 
ith factor to be used in the uth trial is then &;,, = & + Sawa . 

We shall assume in what follows that in the limited region of immediate 
interest ¢ can be represented by a polynomial of degree d so that the response at 
the uth point is assumed to be 


gy 7 Bate + Batiw + +++ + Bete + Burin + +++ + Bustin + 
(4) 

+ BrinXo +s + Beasle-raleu + Bintin + ete. 
Following convention, we call 2; %1,-°-: , %; ri ttt, x: > 71%, -°-- etc., the 
“independent” variables. When the polynomial is of degree higher than the first, 
the ‘‘independent”’ variables are not, of course, functionally independent. 

We shall obtain least squares estimates bo , b; , etc., of the coefficients By , fi , 
etc, by fitting Eq. (4) to the N observed values y1, y2,---:, Yu, °°’, Yw- It is 
convenient to write down the constant term as Sor, rather than as ®» defining 
Zo. as unity for all values of u. We call 8; the ith linear coefficient, 8;; the ith 
quadratic coefficient, 8;; the ith linear X linear crossproduct coefficient (or 
simply the zjth interaction coefficient where no ambiguity will arise) and so on. 
The independent variables 2; , x , xa; are similarly named. 

A design which includes k variables and allows all constants up to order d to 
be determined will be called a k-dimensional design of order d. In a polynomial 





MULTI-FACTOR DESIGNS 197 


; k+d ; : , 
equation of degree d there are ( J terms, so that for a k-dimensional design 
tf 


g ‘ . k+d 
of order d, the number of experimental points must be at least j ‘ 
Cc 


1.2 Factorial designs. A factorial design from which are to be determined all 
the polynomial coefficients of order d or less includes all combinations of d + 1 
levels of each of the factors. The number (d + 1)" of experimental points so 
generated is often excessively large compared with the number (" . 7 of 
constants to be determined. In five variables for instance, the factorial design 
would require 3° = 243 points to determine the 21 constants in the second order 
polynomial. The number of experimental points may sometimes be considerably 
reduced by fractional replication [3]. Unfortunately the device of fractional 
replication is not very effective in generating from the higher level factorials 
satisfactory designs of order greater than one. For the particular problem here 
considered of fitting multivariate polynomials to data there seems to be no reason 
for basing experimental arrangements on the factorials and a more fundamental 
approach will be attempted. 

1.3 Requirements. The following are properties of an experimental design of 
order d which are desirable in the present context. The relative importance of 
these properties depends on the particular experimental situation. To be of value 
for specific: purposes a design will not need to possess them all. 

(a) The design should allow the approximating polynomial of degree d (ten- 
tatively assumed to be representationally adequate) to be estimated with satis- 
factory accuracy within the region of interest. 

(b) It should allow a check to be made on the representational accuracy of the 
assumed polynomial. 

(c) It should not contain an excessively large number of experimental points. 

(d) It should lend itself to ‘blocking’. 

(e) It should form a nucleus from which a satisfactory design of order d + 1 
can be built in case the assumed degree of polynomial proves inadequate. 

In this paper we are concerned with interpreting (a) in such a way as, where 
possible, to satisfy the other properties also. In Section 2 some general results in 
Least Squares are stated. In Section 3 the criterion of orthogonality is discussed. 
In Section 4 the concept of the variance function for the designs is introduced. 
This indicates the desirability of designs for which the variance is constant at a 
constant distance from the origin of the design. Such designs are called rotatable 
designs and the conditions that such designs must satisfy are derived in Sections 
5 and 6. In Section 7 second order rotatable designs are obtained. The arrange- 
ment into blocks of second order rotatable designs is discussed in Section 8. The 
details of the calculations required when using the designs is given in Section 9. 
Completely worked numerical examples will appear in [14]. Section 10 discusses 
the construction of a confidence region for a stationary point. 





198 G. E. P. BOX AND J. S. HUNTER 


2. Least squares results. For any linear model, such as (4), in which there are 
L unknown coefficients, the N equations at the N experimental points may be 
written in an obvious matrix notation as 


(4a) n = XQ, 


where the N X L matrix X is called the matrix of independent variables. 
If the observed values found at the N experimental points are represented by 
a vector Y and 


(5) &(Y) = «a; &(Y — n)\(Y — n)’ Ivo 


then, on the supposition that the mathematical model (4a) exactly represents the 
true situation, the estimates B of 3 linear in the observations which are unbiased 
(i.e., &(B) = $8) and have severally the smallest possible variances, are those 
which reduce to a minimum the sums of squares of discrepancies (Y — Y)/(Y — Y) 
between the observed values Y and the values ¥ = XB given by the fitted fune- 
tion. These are the “least squares”’ estimates 


(6) B = (X’X) XY. 

Their variances and co-variances are the elements of the matrix 

(7) &(B — 8)(B — 6)’ = (XX) 

and an unbiased estimate of (V — L)o’ is provided by the quantity 
(8) (Y — Y)’(Y — Y) =Y’Y — B’X’KB. 


If, contrary to supposition, the mathematical model n = X@ is inadequate and 
in fact L, further terms X;$, are needed to ensure an adequate representation of 
the response so that 


So = X3 + Xi3; ’ 
then the estimates given by (6) are biased for 
(8a) 6(B) = 6 + Agi, 


4 . . . . . . 
where A = (X’X) X’X; is an L X J, matrix of bias coefficients which has been 
called, [1], the “‘alias’’ matrix. In this situation the residual sum of squares is also 
biased and we find 


&(¥’Y — B’X’XB) = (N — L)o’ + 6:(K: — XA)’(K: — XA)8, 
= (N — L)o’ + 6:Xi(I — X(X’X) "X’) XG 


(8b) 


2.1 The moment matrix. Equations (6), (7), (8), (8a), and (8b) contain the 
matrix X’X of sums of squares and products of the independent variables. We 
notice that N-’X’K may be viewed as a matrix of moments of the design. For 
example, if there are k = 2 variables, and we are considering a design of order 
two, the equation to be fitted is 





MULTI-FACTOR DESIGNS 


(9) n = Boro + Bit, + Bore + Bury -+ Broa + Bi2%1X2 
and the matrix N’X’X is 


2 1] 22 12 

(2) {ti} (22 maa 
{11} {12} [111] [122 [112] 
[12] [22] {112} [222] {122} 
{211} (112) [1911] [1122] [1112] 


[122] [222] [1122] [2222] [1222] 


[112] 2} [1112] [1222] [1122] 


The quantities in square brackets denote the moments of the design. For ex- 
ample, NV : 1 Ze [1], NV , ae 1 Zit, = [1 12] and so on. We shall call 
N'X’X the moment matrix and its inverse N(X’X)™ the precision matrix. When 
o = 1 the elements of this latter matrix are the variances and covariances of the 
effects measured on a ‘“‘per-observation”’ basis. 


3. Orthogonal designs. The problem of choosing a “‘best’’ design for the 
fitting of a model n = X@ has usually been interpreted as that of satisfying the 
requirement that D should be so chosen that the coefficients 3 are separately 
estimated with smallest variance. In references [4], [5], [2], and [6] a theorem is 
proved (for the case where the variables in the matrix X are functionally inde- 
pendent and the diagonal elements of X’X are fixed by the definition of the 
problem) that the requirement of smallest variance is satisfied by so choosing D 
that the matrix X’X is diagonal. Such an arrangement may be called an orthog- 
onal design. 

In the present context it is only in the case of designs of first order that the 
variables are functionally independent and that all the diagonal elements of 
X’X are fixed by the definition of the problem. For this reason, as we see in more 
detail below, the above theorem is directly helpful only in the derivation of first 
order designs. For higher order designs an alternative approach is necessary. 

3.1 First order designs. In this case the independent variables are 2, 2% , 
+--+, a. Since these variables are also functionally independent, and since 
> 2% . = N(@i = 0,1, 2, ---, k) so that the diagonal elements of X’X are fixed by 
definition of the problem, the smallest variance theorem referred to above leads at 
once to the conclusion that a best design matrix D is one for which NX’K = I. 
It will be noted that for this case we are led to a unique form of moment matrix. 
Such a moment matrix is realized in practice simply by choosing D to have 
orthogonal columns subject to Eqs. (3). 

The construction and properties of such designs are discussed in [2]. Geo- 
metrically the designs consist of N points, at the vertices of an NV — 1 dimen- 
sional regular simplex if k = N — 1, or the projections onto a space of k dimen- 





200 G. E. P. BOX AND J. S. HUNTER 


sions of the vertices of the N — 1 dimensional regular simplex if k < (N — 1). 
The arbitrariness in the choice of D corresponds to the fact that the simplex may 
be taken in any orientation. This class of designs includes the factorials and 
fractional factorials. These latter designs are of special value because they are 
easy to carry out, they allow the adequacy of the first degree representation to be 
checked and the nature of departures from it to be readily identified, they form 
natural nuclei which can be augmented to form designs of higher order, and they 
are readily arranged in blocks. 

3.2 Second order ‘orthogonal’ designs. For designs of order higher than the 
first the quantities x ; 2, -°:, te 5321, °°*,%e 5 Ue, Wily, °°, Meates T, 
etc., are not all functionally independent and a diagonal moment matrix is im- 
possible of attainment since, unless the x;, are all zero, certain sums of products 
such as those between x; and 2» and between 2; and 25 are necessarily positive. 

Orthogonal second order designs of a sort can be obtained if we redefine the 
independent variables in terms of the orthogonal polynomials. We show below 
however that there is an infinite variety of such designs with widely different 
properties and that these designs do not provide a wholly satisfactory solution to 
our problem. 

Let x{” be the orthogonal polynomial of mth degree for the 7th variable 2; 
Thus 


/ (m) m—1 
(11) x Li t+ Ami. m2 + °°° + Qinmli + Gom, 


where the a’s are chosen so that 
N 


(12) Daz.” = 0, p 


u==1 


Then we can express the original polynomial equations in terms of these ortho- 
gonal polynomials and their products in the form 


n = (XP)(P'3) = X6, 


where P is the matrix transforming the old independent variables to the new. 
For clarity we wil! discuss the particular case of a two dimensional design of 
order 2 but, as will be readily appreciated, the conclusions drawn will be quite 
general. 

Using (3) with (11) and (12) we have 
(13) x iy Be a, — {it)z, 


and the second degree equation (9) for k = 2 could be written as 


ee (Bo + Bu + Bee) Xo + (Bi + {111} Bu)21 + (Bo+ [222]B22)ar2 


(14) ; ‘ 
+ Bu(ai — [111Jar — 1) + Beo(ag — [222]r2 — 1) + Butitr. 


The symmetric moment matrix N~’ X’X is then that given below. 





MULTI-FACTOR DESIGNS 


Here a = 
e = [122], f = [1111] — 111) 
g = [1122 
h = {1112} 
j = [1222 
To obtain a diagonal matrix we must evidently choose the design so that 
[12] = [112] = [122] = [1112] = [1222] = 0 and [1122] =1, 


which insures that all the elements of the matrix (15) vanish except those on the 
diagonal. Examples of such designs are the factorials with more than two levels 
and the orthogonal composite designs given by Box and Wilson [1]. 

There seems little justification for limiting consideration to only these arrange- 
ments. In particular nothing in our discussion has indicated that the choice 
[1122] = 1 is necessarily a good one, or that it would not be better to choose some 
other value and let the quadratic effects be correlated. Again, it is far from clear 
what constitutes a “good” choice of the diagonal elements [iii] — [iii]? — 1 
corresponding to the quadratic constants in the moment matrix. 

Since the scaling of the design has been standardized, {iii}’ and [iiii] are measures 
of “skewness” and “‘kurtosis’” for the ith variable. The choice of the moments 
[i271] decides the question of whether the marginal distribution of the pattern of 
design points for ith variable is to be symmetric or skew. The choice of the 
moments [ziti] decides whether there is to be a tendency to a uniform distribu- 
tion of points or to a concentration of points at the center and at the extremes of 
the range. Since for all such designs in.our conventional scaling the variances of 
linear, quadratic and interaction estimates corresponding to the ith variable are 
oN, o N ‘({iiti] — [iii]? —1)~ and o’N™ respectively this choice also decides 
the relative precision with which linear quadratic and interaction coefficients are 
estimated. 

It may be noted for example that, for the 3° factorial design in conventional 
sealing, [277] = 0 and [2zi7] = $ (¢ = 1, 2, --- , k). The variances of the estimates 
for the quadratic coefficients are thus twice as large as those for the interaction 





202 G. E. P. BOX AND J. S. HUNTER 


coefficients. In terms of estimated derivatives at the center of the design there- 
fore, the estimated “quadratic” derivative d°n/(dx,)° has eight times the variance 
of the estimated “interaction” derivative 0°n/dx,dz;. This was pointed out in [1], 
where an intuitive attempt to reduce this apparent unbalance was made by in- 
troducing designs in which quadratic and interaction derivatives were determined 
with equal precision. In fact designs both orothogonal and non-orthogonal can be 
found for which the relative variances of estimated coefficients of different kinds 
can differ over a wide range. Up to this point the present discussion has provided 
no satisfactory basis on which a rational choice can be made. 

In selecting from possible orthogonal designs it seems at first sight that the 
quantities [iii] — [iii]’ — 1 should be made as large as possible. This would seem 
to give the smallest possible variances for the quadratic effects without affecting 
the precision of the remaining constants. On closer inspection however the ap- 
parent advantage of such a choice turns out to be somewhat illusory because 

(a) The apparent advantage of making the quantity [iii] — [iii]’ — 1 large 
arises only because of the particular scale convention adopted. If for example we 
scaled our designs on the basis of the size of the fourth moment instead of on the 
size of the second moment a contrary conclusion would be reached. 

(b) The quantity [777] enters not only into the precision matrix but also into 
the alias matrix. In fact for any orthogonal design of this type the expected value 
of the ith linear effect is 


A 
&(b:) = Bi + [iid] Bis + DU Biss. 
Jt 
Thus the apparent reduction in the variance of the quadratic effects is gained 
only at the expense of an increase in possible bias in the linear effects. 

(c) The quadratic effect 8;; measures curvature of the surface in the direction 
of the ith coordinate axis. It is shown in the next section that by attempting to 
measure the precision with which curvature is determined in the directions of the 
coordinate axes we may decrease the precision with which it is determined in 
some other direction which might be of equal importance to the experimenter. 

3.3 Effect of rotation on precision of the estimates. If, as we shall assume, we wish 
to use the design to explore a surface about which little is known, we shall in 
particular not know how the design is oriented relative to the response surface. 
For example suppose the surface could be represented locally by an equation of 
second degree, then the response contours would be a set of conics which could be 
referred to their principal axes. The orientation of these axes relative to the axes 
of the variables would differ from one problem to another. It is of some interest 
therefore to consider how the variances and co-variances of the estimated coef- 
ficients are changed when the design is rotated. 

As an example suppose that we were to use the symmetrical 3° factorial design 
to estimate the coefficients in the second degree Eq. (9). Then bearing in mind the 
conventions expressed by Eqs. (3) concerning the origin of the design and the size 
of the scale factor, we should use the nine combinations of the levels 





MULTI-FACTOR DESIGNS 


VARIANCES 
QUADRATIC TERMS 


| bi 


VARIANCES 
INTERACTION TERMS 


y 


CORRELATIONS 
BETWEEN QUADRATIC 
TERMS 


CORRELATIONS 
BETWEEN QUADRATIC & 
INTERACTION TERMS 


= 
aap 


Fic. 1. Variances of, and correlations between, second order coefficients estimated from a 3? 
factorial design rotated through an angle 6 


x; = (—a, 0, a) and x, = (—a, 0, a), where a = (3 )*. In the normal orientation 
of the design the variances of the linear effects, quadratic effects and interaction 
effects would be o°/9, 20°/9 and o°/9 respectively, and consequently the cor- 
responding entries in the precision matrix (which measure the variance on a 
“ner observation” basis for unit experimental error variance) would be 1, 2, and 
1, respectively. All the covariances between these effects would be zero. If the 
design were rotated through some angle @ however then, as is illustrated in Fig. 1 
the variances of the quadratic and interaction effects would undergo marked 
changes and the quadratic effects would become correlated with each other and 
with the interaction effect. Only the linear effects would have constant variance 
and would remain unchanged in all orientations. 

We see that the variances of individual coefficients estimated using a design in 
a particular orientation may give a somewhat misleading impression of its 
efficiency. The condition of orthogonality refers to orthogonality in a particular 
orientation, and this property is in general lost on rotation of the design. 





204 G. E. P. BOX AND J. S. HUNTER 


4. The variance function for the design. We have proceeded so far by con- 
sidering the accuracy with which individual coefficients are estimated. This 
approach does not, for the case of designs of order higher than the first, seem to 
lead to any unique class of solutions, but points to the conclusion that we should 
in some way consider the joint accuracy of the coefficients. We are really in- 
terested in the individual coefficients only in so far as they supply information 
about the surface. To make further progress therefore we consider what we call 
the design ‘“‘variance function”’. 

We shall denote the k coordinates 7 , --- , 2; , --- , 2, of a point in the space 
of the variables by the k X 1 vector x = {z;}. Suppose that g, is the response 
estimated at the point x using a polynomial fitted by least squares to N observa- 
tions made in accordance with some experimental design D. The variance 
V(gz) of this estimated value is a function of x and o and we can reduce V (Gz) 
by increasing N (for example by replicating the points). The quantity V(x) = 
NV(g.)/o’, or alternatively its reciprocal W(x) = o/NV(@,), is thus a stand- 
ardized measure of the accuracy with which the design D allows the response at 
the point x to be estimated. NV(g.)/o’ will be called the variance function of the 
design and W(x) = {V(x)}~ the weight function. For any experimental design 
V(x) provides a standardized measure of the precision of the estimated response 
at any point in the space of the variables. It is a function of 7 , v2, --- , 2, and 
the elements of the precision matrix alone and is uniquely defined for every k 
dimensional experimental design of order d. 

For example suppose we used the nine points of the 3° symmetrical factorial as 
a second order two dimensional design. On the convention that the origin and 
scale are chosen so that [1] = [2] = O and [11] = [22] = 1, we have for the 
variances and covariances of the effects V(b) = ($)o°, V(b) = V(be) 
(4)o", Vi(bu) = Vibe) = ()o°, V(b) = ($)o° and Cov (boby,) Cov (bobo) 
(—)o”. The variance function for this design is therefore 

V(x) = < V(Gs) = 5 — Bay — Bap + 2ri + 2xg + ziz3 
The variance contours for which are shown in Fig. 2(i). 

In Figs. 2(i1) and 2(iii) are shown variance functions for other two-dimensional] 
second order designs mentioned in [1]. The arrangement in figure 2(ii) is the 
‘pentagonal design” and that in figure 2(iii) is an example of a class of designs, 
already referred to in Section 3.2, in which quadratic and interaction derivatives 
are determined with equal precision. It will be seen that the arrangement of 
points in the latter design is in fact almost the same as would be obtained by 
rotating the factorial through 45°. 

If, as we shall suppose, nothing is known in advance about the orientation of 
the surface, it seems most appropriate to adopt designs which have variance 
functions like that of the pentagonal design. That is to say designs which generate 
information such that the response is estimated with constant variance at all 
points equidistant from the origin of the design. When we have no knowledge in 





MULTI-FACTOR DESIGNS 


Fic. 2. Variance contours for some 2 dimensional designs 


advance of the orientation of the surface relative to the design this also seems 
instinctively to be a sensible requirement for designs other than those of second 
order. 

In general, for any k— dimensional design, if the variance of the response 
estimated by the fitted polynomial is a function only of 


so that the variance contours in the space of the variables are circles, spheres or 
hyperspheres centered at the origin the design will be said to have a spherical 
variance function V(p). An arrangement of points giving such a variance function 
will be called a rotatable design. 

The remainder of this paper is devoted to constructing rotatable or nearly 
rotatable designs, that is, arrangements of experimental points which symmetri- 
cally generate information in those coordinates regarded as most relevant by the 
experimenter. We shall interpret requirement (a) in Section 1.3 in this sense. 


5. Condition for rotatability. In the developments which follow we need some 
properties of derived power and product vectors and Schlaflian matrices [7], 
[8], and [9]. If x’ = (a, 22, --- , 2») then we denote by x’! the derived power 
vector of degree p. For example if k = 2 


7 (2) 


; 2 2 ol 
x’ = [x , 22] and x’ = [a1 , 22, 2° 2 Xe) 


rp) 


and in general x’’” will contain as elements all the powers and products of total 
degree p and less (duly ordered) of the elements in x’ with suitable multipliers 
attached so that x’'”!x'”! = [{x’x]’. If a vector x is transformed to a vector 
z by z = Hx, the pth Schliflian matrix H’ is defined such that z’”’ = Hx", 
It is readily confirmed that [HK]"”? = HK"! and also that if H is orthogonal 
then so also is H'”’. 

We need some properties of spherical distribution functions discussed in 
references [2], [10]. These distribution functions are of some importance in basic 





206 G. E. P. BOX AND J. S. HUNTER 


statistical theory, and especially in randomization theory, but these aspects 
are not pursued here. 

5.1 Moments of a spherical distribution. If we have a set of random variables, 
2,22, °** , 2, Which may be regarded as the elements of a random vector z, 
and each of which has zero mean and unit variance and if their joint distribution 
can be written in the form 


(16) p(z) = kf(z’z), itt; £4 ¢# 


~ 


where W may be infinite and k is taken so that the integral over the whole 
space is unity, then since the density will be constant on hyper-spheres centered 
at the origin of the 2’s, we shall say that the variables have a spherical distribu- 
tion. 

Now if all the moments of a distribution exist, and the m.g.f. o(t) ean be 
expanded in an infinite series, we can write this series 


as — ls 
(17) =1+) —¢"m,, 

s=1 8: 
where m, is the vector of moments &{z‘’}. But for a spherical distribution the 
moments are invariant under any orthogonal rotation of the coordinates whence 
the m.g.f. is 


g(t) 
that is, 


(18) g(t) = ¢o(H’t) 


for any orthogonal matrix H. Regarding now the matrix H as transforming 
the matrix t’, this implies that g(t) is unchanged by any transformation on 
t which leaves t’t unchanged. The m.g.f. is therefore a function of t’t and can be 
written in the form 


2 


ws 
(ete) P 
(19) g(t) = 1+ 2) ry — (t't)’, 
p=1 p-4 
where the )’s are real positive constants depending on the function f in (16). 
Equating terms in (17) and (19) and writing [1°', 2%", --- , k**] for the moment 
S[at', rz, --- , c™*] we have 


(20) — 


(4a;)! 
i=l 


where a = )-}~: a; is called the order of the moment. 
If the z’s are independent so that 


(21) p(z) = II p(z;), 


i=l 





MULTI-FACTOR DESIGNS 207 


then it has been shown, in references [11] and [12] that the only spherical distribu- 
tion possible is the multi-variate normal with equal variances and zero covarian- 


ces, which may be called the spherical multi-normal distribution. For this dis- 
tribution the m.g.f. is 


(22) g(t) = exp[}(t't)] 
and all the \’s are equal to unity. We see therefore that for any spherical dis- 
tribution, the moments of the same order bear the same relationship to one 
another as do the moments for the spherical multi-normal. The moments of 
different orders will however depend on the \’s and hence on the function f(z’z). 
5.2 Variance of an estimated response. Consider the response 9, estimated 
by a fitted polynomial of degree d at the point whose co-ordinates are given 
by the last & elements of the vector x’ now defined as x’ = (1, 2 , 22, «++ , 2x). 
The polynomial has (* ; *) = L terms and the estimated response at the 


yoint 2%, Ze, °** , M18 
’ 


sins Dz = bo + biti + bat, + +++ + dete + buti + bard + --- 
+ byptt + Dyetite + --- + bea eteate + Dusti + etc., 
which may be written 
(24) 9. = x’, 
where the L X 1 vector b contains all the b’s with suitable multipliers attached 


so that (24) is equivalent to (23). Suppose also that the true response at this 
point is given by 


(25) ~~ oe 


Then for a given design matrix D for which there exists a matrix of independent 
variables X of full rank LZ the variance of @, is 


V(gz) = &{ (92 — 22)(G2 — 22)'} = x'8{(b — B)(b — G)’}x" 
= xR xo’. 


Consider now the variance of a second estimated value gj, which is the same 
distance p from the origin and whose co-ordinates are the last k elements of the 
vector z = Rx, where R is an orthogonal (k + 1) X (k + 1) matrix consisting 
of an arbitrary orthogonal matrix H bordered by a first row u’ = (1, 0,0, --- , 0) 
and a first column u. Making the substitution in (26) we have 


V (Gs) “. x Ry’ XR x4’ 
(28) = x4 RMX ER), 


(26) 


To satisfy the condition that the variance is constant on spheres centered at 
the origin of the design we require therefore that (28) and (26) are identically 
equal for every x and every R. Whence 





208 G. E. P. BOX AND J. S. HUNTER 


(29) XK = R''X’KR™ 


for every orthogonal matrix R. Now N'R’'X’XR™ jis the moment matrix 
for the design matrix HD, and consequently the variance is constant for every 
point a distant p from the origin if and only if the moment matrix is invariant 
under orthogonal transformation of the design matrix. This means that unlike 
the 3° factorial design whose behavior under rotation is illustrated in Fig. 
every variance and covariance of the b’s and all the moments and mixed moments 
of the design must remain constant under rotation. We now need to find the 
form of moment matrix N~‘*X’X for which Eq. (29) is satisfied. 

5.3 Moments of a rotable design. We redefine the vector t’ to be (1, ti , fa, --+ , te) 
and consider expression 


(30) Q= Ne xx4 
which is a generating function for the moments of order 2d and le "8s of the design. 


om ° . ’ : d 
More specifically, since if x, = (1,21 ,%eu,°** » eu), XX = a oil x! fyi | we 
have 


Q=1 nme (Ds alta , a) yi 
u=1l 


: 
Nd (t'x.x.t)’ 


u=1 


N 
1 ' 2d 
= ND) (1+ tht + tet + -°> + teten)™. 


u=l 


nis . ‘ . nae a ae nf N 2 
Thus if we write [{1%, 2°, --- , k**] for the moment N at Sie Lea 
- 


then the coefficient of ¢f{', t?, --- , #@* in Q is 


__ (2d) — | ad Qe 
Il a;!(2d — a)! 


i=] 


(32) 


Now from (29) the design is rotatable if and only if 
(33) Q=N CU xxe! = N'Y RM XXR™ 
(34) N(t/R’)x’x(Rt)™; 


that is to say, if any transformation which leaves t’t unchanged does not change 
Q. Hence the design is rotatable if and only if Q is some function of t’t and since 
it is a polynomial in the ?’s it must be of the form 


(35) a (ne ). 


s=() i=] 


ae 


The coefficient of tf’, tr*, --- , &* in this expression is zero if any of the a; are 
odd integers. If the a; are even integers the coefficient is 


k 
(36) ata)! / [I Ga,)!. 


i=l 





MULTI-FACTOR DESIGNS 209 


We may now equate coefficients to obtain specific values for the moments to 
order a = 2d as follows 


k 


II Ga)! 


i=l 


, 


and if we write 


gal2ra 4 —_ ! 
(38) a2" (3a) (2d — a)! = h., 


2d! 


then finally the moments of a rotatable design of order d are 


[i 272, --- KY =O if one or more of the a; are odd, 


Il a;! 


i=l 


hg = if all of the a; are even, 
2°? TT Ga,)! 


i=l 


which are the moments up to order 2d of a spherical distribution. 

5.4 Effect of transformations on zero moments. It is readily seen that if the 
original variables z;, 22, «-- , 2 are transformed by any non-singular linear 
transformation to variables X,, X,, --- , X; then any moment of order a for 
the new variables will be a linear combination of moments all of order a for the 
old variables. It follows in particular that if an arrangement of points is such 
that all the moments of a given odd order are zero then they remain zero in 
every orientation of the arrangement. 


6. Moment requirements for rotatable designs of first and second order. 
It has been shown that the moment matrix for a rotatable design of order d 
has a specific form which is invariant under orthogonal rotation of the axes of 
the variables. The mements which are the elements of this moment matrix are 
the same as those of a spherical distribution and are known apart from arbitrary 
constants Ao, Ae, -*- , Aw. The variance function V(p) for a rotatable design 
depends only on Ao, Az, > , Ae and on p = (x’x)' * In selecting a design of or- 
der d we can proceed as follows: 

(a) Using Eq. (39) we first obtain in terms of the \’s the form of the moment 
matrix for a rotatable design of the required order. 

(b) Since for a rotatable design the variance function depends, apart from the 
\’s, only on p it is now comparatively easy to study the effect on the variance 
function of varying the \’s. Having selected the \’s to give a satisfactory variance 
function and alias matrix the required form of the moment matrix is com- 
pletely defined. 

(c) We have then to determine actual arrangements of points which, so far 





210 G. E. P. BOX AND J. S. HUNTER 


as possible, satisfy the requirements listed in Section 1.3 as well as these moment 
conditions. 


From Eq. (39), Ao = [0] and Az = [zz], (¢ = 1, 2, --- , k). Since by convention 
[0] = [iz] = 1, Xo and 2 are always equal to unity, and no element of choice 
for the \’s arises with rotatable designs of first order, but only with designs of 
order 2, 3, etc., for which the values of \4 , As , etc., must be selected. In making 
this selection both the variance function and the alias matrix must be con- 
sidered. It will be recalled that the alias matrix contains as elements the co- 
efficients of biases which arise in the estimated coefficients when the assumed 
form of the model is inadequate. For a design of order d it is natural to first 
consider biases arising from terms of order d + 1 not allowed for in the assumed 
form of the model. If we suppose that the true form of the model is of order 
d + 1 it will be clear from the form of the alias matrix A in Eq. (8a) that for a 
rotatable design of order d the coefficients of the biases can be completely ex- 
pressed in terms of Xo, Az, --* , Ae and the moments of order 2d + 1. When 
possible, it is advantageous to use a rotatable design of order d for which all 
moments of order 2d + 1 are zero. From Section 5.4 these moments are then 
zero in every orientation. The alias matrix A is a function only of the \’s and all 
biases arising from terms of order d + 1 are avoided except those which arise 
inevitably because the bias coefficients are functions of the \’s themselves. 

The remainder of the present section is devoted to determining the necessary 
form of the moment matrix and to discussing the properties of designs of first 
and second order. Specific second order designs having the required form of 
moment matrix are derived in Section 7. 

6.1 Rotatable designs of order 1. Suppose we have k variables 7, 22, --- , % 
and we desire to fit a polynomial of degree d = 1, that is to say, a fitted equa- 
tion representing a plane 


(40) Uz = bo + bx, + boats + i bute ° 


Then from Eq. (39), for a k-dimensional rotatable design of first order 

all moments [7] (¢ = 1, 2, --- , k) of order 1 are zero, 

mixed moments [ij] (i # 7 = 1, 2, --- , k) of order 2 are zero, 

quadratic moments [iz] (¢ = 1, 2, --- , k) of order 2 =r, = 1. 
Thus the moment matrix N'X’X is I... . 

The condition that a first order design is rotatable is thus precisely the same 
as that it should have smallest variances, namely that its moment matrix 
should be the identity matrix. 


The variance function for this type of design is given by 
(41) NV(§z)/0 = V(p) = (14+ p), 


where p = { Dii-12i}"”. 
The standardized weight function W(p) = [V(p)]”’, which shows the relative 


precision of the estimate g at a distance p from the center of the design, for any 
first order rotatable design is graphed in Fig. 3. 


Setting aside our scaling convention we see that in general the variance of 





MULTI-FACTOR DESIGNS 


fr 


Fig. 3. Weight function for any first order rotatable design 


the estimated response at a distance p from the center of any rotatable design 
of first order is given simply by 


(41a) V(g) = V(bo) + Vibs)p. 


This is of the same form as the well-known formula for a single variable x with 
p replacing z. 

6.11. Biases due to second order coefficients. If it happens, contrary to assump- 
tion, that terms of second order are not negligible, then using Eq. 8a, the expected 
values of the estimated coefficients for any first order rotatable design are as 
follows 


E(bo) = Bo + ae Boo 


9 


k ok 

&(b) = Bs + DD [ghilBon. (¢ = 1,2, --- ,k) 

o=l hag 

Those terms printed in boldface type inevitably arise but those in ordinary type 
may be eliminated by a suitable choice of the design. On our convention ), is 
equal to unity and it follows that, if 8 is estimated assuming a first order model, 
then when the quadratic coefficients 8,, are not zero, bias in this estimate is 
inevitable. In general the coefficients [ghz] of the biases in the estimates of the 
linear coefficients 8, will vary as the design is rotated (see reference [2] for 
particular examples of this). By selecting a design for which all the third order 
moments are zero we may eliminate bias in the estimates of the linear coefficients 
in every orientation of the design (see Section 5.4). Specific designs of this sort 





212 G. E. P. BOX AND J. S. HUNTER 


were discussed in [1] under the name “‘first order designs of type B.” They can 
be obtained by duplicating with reversed signs any orthogonal first order design, 
in particular any of the “simplex” designs of [2], [5]. The two-level factorial 
designs and many of the fractional factorials are also examples of particular 
orientations of designs of this sort. 

6.2. Rotatable designs of order two. Suppose we have k variables, and desire 
to fit a polynomial of degree two. From (39) the moments of a k-dimensional 
second order rotatable design suitable for this situation are such that all odd 
moments are zero, and the remaining moments are [i7] = A. = 1, [#ijj] = 4, 
[iii] = Bry. 

Thus for a second order rotatable design the moment matrix is of the form: 


0 re li@ «wes. Be 113...8—1,F 
1 Ze * 


(43) N’X'X = 





k—-1,k} | 


where the asterisks indicate null submatrices. 
The inverse matrix (i.e. the precision matrix) is readily shown to be 


22 


—2r,A 


k+N-k-DIA~-  GU—-ax0A 
i (=r»DA k+Dxa— (k= DIA 








MULTI-FACTOR DESIGNS 


where 
(45) A = [2daf(k + 2)aa — KY. 


The variances and covariances of the estimated coefficients using any second 
order rotatable design are therefore given by 


NYO) = Oni + 2)4; 


NV(b) _ ,. 
o , 


(46) (k+ mu — (k — A; 


-1 
= )j'; 


NV (bi) _ 
o 


NV(b,;) 
o 


NOt be. ge, Toe Seie. 


~ : (1— WA 


and all the remaining covariances are zero. Thus all first and second degree 
coefficients are uncorrelated except the quadratic coefficients. These have a 
coefficient of correlation {[2/(1 — 4)] — (k + 1)}7. 

If we put 4 = 1 the correlations between the quadratic coefficients are all 


zero and the design is orthogonal (in the sense of Section 3) as well as rotatable. 
We then have 


NV Go) 


o 


i NV) _ 


’ 
a 


i(k + 2); —_ - 


1. 
2) 


NV(bis) _ 1: N Cov (bobis) _ 


oc go 


4. 


The conditions of rotatability and orthogonality together fix the relative variances 
for effects of different orders. In particular for designs of this sort the variances 
of the quadratic coefficients b;; are one half those of the two-factor interaction 
coefficients b;;, in contrast with three-level factorial designs for which the 
variances of the quadratic coefficients are twice those for the interaction co- 
efficients. Compared with the factorial in standard orientation, the orthogonal 
rotatable design thus places four times as much emphasis on the quadratic 
coefficient relative to the interaction coefficients. From (44) the variance function 
for any general second order rotatable design is given by 


(48) V(p) = A{2(k + 2)A4 + Qala — 1)(k + 2)p° + [(k + Ia — (F — 10'S, 
and for the particular case of orthogonal second order rotatable design by 
(48a) Vip) = kK +24’). 


Setting aside for the moment our scaling convention we see that in general 
the variances of the estimated response at a distance p from the center of any 
rotatable second order design is simply given by 


V(Gg) = Vi(bo) + 2 Cov (bob;:)p° + Vibdp? + Vi(bia)p’. 





BOX AND J. 8S. HUNTER 


Fic. 4. Weight functions for second order rotatable designs having various values of 
\ when k = 2 


This, with p replacing 2, is of the form obtained for a single variable x when the 
design employed is any symmetric arrangement. 

In Fig. 4 the standardized weight function W(p) = {V(p)}~* is graphed for 
various values of \, for the case of a two dimensional second order design. 
Similar graphs are obtained for other values of k. 

We notice that whatever value of \, is chosen the precision falls off rapidly 
when p exceeds unity. If we chose \, = 1 the design is orthogonal in the sense 
discussed previously. When dy approaches or exceeds unity, the precision, 
particularly at the center of the design, is high but the bias coefficients are high 
also. It is well to remember at this point that we are comparing designs for which 
the “spread” of points, as measured by the marginal second moments S; 
WT (¢i. — £)°, is constant. Such a convention is bound to favor designs 
with a high value of \,, so that although the general shape of the design weight 
function will be meaningful, the absolute height of the curve at any point will be 
to some extent an outcome of this convention. It seems reasonable to seek a 
relatively uniform distribution of precision in the immediate vicinity of the 
design. This is attained with a value of \, somewhat less than unity and such 
a choice will give satisfactory values for the third order bias coefficient. In 
particular, if the variance at p = 1 is to equal the variance at p = 0, the 
values of \, shown in Table 1 will be needed. 





MULTI-FACTOR DESIGNS 


TABLE 1 
Values of \4 required to make the variance at p = 1 equal to that al p = 0 





k 2 | 3 | 4 5 6 7 8 





Na 0.7844 | 0.8385 | 0.8704 0.8918 0.9070 0.9184 0.9274 
ee ig bial . eamniliacainteicnpeasicilelle lec iapeitlbiiinilaanptvisnlicitalemaghaliiblen 





So far we have supposed that our interest centers on the absolute value of 7 
as estimated by g. In some circumstances we would be more interested in the 
relative value of 7 rather than in its absolute magnitude. For example, the 
estimation of the slope of a first degree surface, or of the position of the sta- 
tionary point on a fitted second degree surface, does not require knowledge of 
the absolute value of 7 but only of its value at one point in the space of the 
variables relative to its value at some other point. 

A natural reference value is 8 , the response at the origin of the design, and 
the appropriate estimate of 


A = n — Bo = Bit. + --> + Bete + Buti + ete. 


d =ba+-:-- + ba, + burt + ete. 


For the rotatable designs already discussed, V (d) as well as V(g) is constant at 
a constant distance p from the center of the design. At a point distance p from 
the center, for any first order rotatable design, we have simply 


(49) Vid) = V(bip” 
and for any second order rotatable design 
(49a) Vid) = V(bip’ + Vibude. 


It is not our intention at this time to claim optimal properties for the designs 
which are here derived. The only justification presented is that these arrange- 
ments symmetrically generate information in the coordinates regarded as most 
relevant by the experimenter. An approach which makes it possible to measure 
the practical efficiency of these and other arrangements has however been 
attempted and it is hoped to publish this elsewhere. In this more complete 
appraisal of the situation, a citerion on the effectiveness of the design is taken 
as the integrated mean square error of 9 over the region of interest to the ex- 
perimenter. The scale of the design, and the constants such as \, , which reflect 
the distribution of the experimental points, are so chosen as to minimize this 
criterion. The integrated mean square error contains two terms: one which 
measures the variance of @ and the other the bias in g due to possible inade- 
quacies of the assumed model. As ), is increased the variance term becomes 
smaller and the bias term becomes larger. The problem is to strike some com- 
promise which will be satisfactory for practical situations. It appears that the 





216 G. E. P. BOX AND J. 8. HUNTER 


values of \, given in Table 1, although perfectly satisfactory, may be a little 
high in the light of this more recent work. 

In the present paper we consider the nature of the bias in the estimates of the 
individual coefficients rather than the nature of the bias in 7. 

6.21. Biases due to third order constants. If it happens, contrary to assump- 
tion, that terms of third order are not negligible, then using equation 8(a), 
the expected values of the estimated constants for any second order rotatable 
design are as follows: 


k ek k k 
(50) E(bo) = Bo — 244A ze os Z b aH [fghit]B son . 


f=1 g=f h=g i=1 


k 
&(b;) = Bi + 32a Biss + Ay D> Bani, 
hxi 


k k 


+ (1— WA LE XE LohidByer, 


f=1 g=f h=g i=1 


i is: 
(53) &(bis) = By + MD DY D [ghisl Bron - 
f=l o=f h=o 

Again terms printed in boldface type arise inevitably but those in ordinary 
type may be eliminated in every orientation by selecting a design for which 
all the fifth order moments are zero. 

The requirements and properties of rctatable designs of third and higher 
order may be studied in the same way as for designs of order 1 and 2. We shall 
not pursue this topic here but will now consider how we may obtain actual 
arrangements of points which satisfy the requirements for rotatable designs of 
second order. 


7. Examples of second order rotatable designs. The above discussion has 
been directed to deciding what type of design we should be seeking. We have 
shown the moment conditions which a design of any given order must satisfy 
to obtain constant precision on spheres centered at the origin of the design. 
We now consider the problem of finding arrangements of experimental points 
which satisfy these conditions. The classes of rotatable designs we discuss are 
by no means exhaustive but rather are intended to illustrate some of the possi- 
bilities. 

We mention in passing that for any design of order d which is both orthogonal 
in the sense already discussed, and rotatable, has moments up to order 24 which 
are the same as those of the spherical multinormal distribution. This is of interest 
since it shows that in the usual multiple regression problem where the values of 
the independent variables x, , t2, --- , x, are not held at predetermined levels 
but are allowed to vary at random we should obtain a good arrangement, if it 
happened the x’s followed a multivariate normal distribution with zero covari- 





MULTI-FACTOR DESIGNS 217 


ances—a conclusion which is intuitively very acceptable. Rotatable designs could 
be approximated simply by taking independent random samples from a normal 
distribution, but in fact it is possible to satisfy the criterion of rotatability exactly. 

We have seen that for a rotatable design of order d the moments must be the 
same up to order 2d as those of a spherical density function. This suggests that 
we might construct rotatable designs by equally spacing the available finite num- 
ber of experimental points on one or more spheres. 

We find in fact that it is convenient to regard designs as built up from a num- 
ber of component sets of points each set having its points all equidistant from the 
origin. This we call an equiradial set and p the distance of each point from the 
origin the radius of the set. If the moments to order 2d of such a set are unchanged 
by rotation we call this an equiradial rotatable set of order d. 

An equiradial rotatable set of order d does not necessarily, or even usually, of 
itself provide a design. For example, n points at the origin provide an equiradial 
set of infinite order. Furthermore, no single equiradial set can provide a design 
of order greater than one, for if p is the common distance from the origin then 


k 

2 Liu = p Tou; u=1,2,---,N. 

i= 
It follows that if d > 1 the matrix of independent variables for such an arrange- 
ment is singular and the quadratic coefficients by , be, --- , bk and the con- 
stant term bp cannot be separately estimated. 

As is shown later we can obtain equiradial sets of points which satisfy the 
moment conditions (39) for rotatable design of order 2 but only for such values of 
\, as lead to a singular moment matrix. For such a set of points p) = pint Tis 
for u = 1,2, --- , N, whence 


Nok k 

(54) p=N" > Dd ci = > [7] = k; 
ual jal im] 

also, 


N ( 
(55) =N")> 


i=l] imi jxti 


> a} = > pd + ¥ be, 


whence 

(56) 3kyy + k(k — IN = 
and 

(57) hy = k/(k + 2). 


When this value of ), is substituted in (45) and (46) the quantity A becomes 
infinite and the quadratic terms and the constant term are not separately estima- 
ble. We shall refer to this value \, as the “singular” value. 

Now although no single equiradial set can supply a usable second order design, 
two or more sets can do so. Suppose we have s equiradial rotatable sets of points 





218 G. E. P. BOX AND J. S. HUNTER 


having the same origin such that in the wth set there are n, — each at a dis- 
tance p, from the origin then the value of A, for the whole aggregate of the 
N= le Nw» points is 


Nk > Nu Pe 


w= 
4 = 


(k + 2) ( - Nw es) 


w=1 


; 


which will not in general have a singular value. By combining equiradial sets we 
can thus obtain rotatable designs. Since a set of points at the origin will affect 
only the value of N in (58) the formula may be applied without modification to 
designs for which one set of points is at the center. In practice, we shall find that 
the placing of one or more points at the center of an equiradial rotatable set 
provides one useful means of modifying the value of \, . If there are n; points at 
the origin and nz points in the equiradial rotatable set we see that for the ag- 


gregate of n, + ne points 
k(n, + ne) 
(k + 2)m * 

We now show that sets of points which are equally spaced on a circle, a sphere, 
or a hypersphere and which thus form the vertices of a regular polygon, poly- 
hedron, or polytope, can provide rotatable sets which may be combined to form 
rotatable designs. Our study begins with the two dimensional figures. 

7.1 Two dimensional designs. We first show that for the vertices of a regular 
n-gon, all moments up to order n — 1 are invarient under rotation. 


(59) y= 


Suppose the coordinates of the uth point are p cos (g + 2ru/n) and p sin 


129 /n 


(¢ + 2ru/n) and that a = e’* and w = ¢ . We have for the moment [pg] 
which is of order p + ¢ 


n—1 


(60) [pqle = (30)?"r * 7 (aw + aw “)?(aw" — aa *)*. 


u = 
After expanding the bracketed expressions and collecting terms 


n—1 
(61) [pql. = (4p)?"* > oe _ ie \ ‘) eer Z a , 


r= t=—0 u=0 


By putting a = | in the expression and substituting the result from (61) we 
obtain the change in the value of the moment after rotation through an angle ¢ 


ie 

(62) [pqle — [pale = (4p)? i oat ae —1} Dott terry, 
r=Q (=( u =f) 

Now 


n—1 


u=0 


if p+ q — 2(r + t) = 0 or mn where m is an integer, 


otherwise, 





MULTI-FACTOR DESIGNS 219 


and —(p+q)Sp+q-—-2r+t Ss p+gq. Thus if (p+ q) <n the ex- 
pression on the left of (63) is zero unless p + gq = 2(* + 4), but in this case a 
qi?te sO) _ 1 = 0, Hence if p + q < n then (62) is zero whatever the value 
of g and our assertion is proved. 

A class of two-dimensional second order rotatable designs may be constructed 
therefore from two or more concentric rings of equispaced points with unequal 
radii. Points at the origin constitute a ring of zero radius and each ring which is 
not of zero radius must contain at least five points. The number of the points in 
each set and the radial distances will determine the value of \, in accordance with 
equation (58). 

Of this class of designs the simplest are those having one ring of nz = 5 equally 
spaced points with additional n,; points at the origin. 

Pentagonal designs with center points. Putting nz = 5 we obtain the following 
values of \, for specimen numbers of center points 


Number of points at center of pentagon........... ] 3 5 
i 0.6 0.8 1.0 


By using three points at the center a value of Ay = 0.8 is obtained which is close 
to that given in Table 1. For orthogonality five center points are required. 

Hexagonal designs with center points. If we put nz equal to six so that the ex- 
ternal points are at the vertices of a hexagon, we obtain the added advantage 
that all the moments of order 5 are zero thus insuring that in every orientation 
of the design the estimate bo and the estimates b;; , b;; of the second order co- 
efficients are not biased by any third order terms. A value of \, close to that given 
in Table 1 is obtained by placing n, = 3 points at the center. ““Orthogonality”’ 
is obtained when n, = 6. 

Designs containing two rings of points. A variety of designs, which however use 
more than ten points, can be obtained by combining two or more concentric 
circles of equispaced points. Table 2, below, shows some of the alternatives. 
Values of the ratio of radii are shown (i) which give \, = 0.7844, the value given 
in Table 1, and (ii) which give \, = 1.0, the value required for orthogonality. 


TABLE 2 


Radii for equispaced points on concentric circles 


i 5 5 | 6 
7 


Ne 7 8 | 

p2/p: for \y = 0.7844 0.4388 | 0.454 0.407 

p2/p: for v4 1.0 : 0.267 0.304 0.189 

The arrangements so far discussed by no means exhaust the possible second 
order designs in two dimensions. A further class of designs is obtained by com- 
bining sets of equiradial points which are not individual rotatable sets of order 2. 
For example, sets of three points each of which form the vertices of an equilateral 
triangle with center coincident with the origin, may be combined to form such 





220 G. E. P. BOX AND J. S. HUNTER 


arrangements. Suppose that a line, assumed to be of length pv/ 2 and connecting 
one of the vertices of an equilateral triangle to the center makes an angle ¢ 
with the z, axis. Then it is easily shown that the second order moment matrix 
for this arrangement is 

22 


9 


p 
3 
—pa 
pb 


4 
3p 


$e" 
_ pb 
where a = 2” sin 3¢ and b = 2° cos 3¢. 
This is of the form required for rotatability except for the elements containing 
a and b. Suppose s arrangements of this sort are combined, the wth such arrange- 
ment having p = p» and ¢ = ¢, . Then the moment matrix will be of the exact 
form required for rotatability if 


(64) > Py sin 39, = 0 and 7 py» COS 3¢, 
w=) wei 


This implies simply that the sum of the s vectors 
3 ia ji « 3 o So 
(p1 COS 0¢1, pi SIN O¢1), -** , (Ps COS OG, , Pz SIN 4Y,) 


shall be zero (i.e., they form the sides of some polygon) and an infinity of solu- 
tions is at once obtainable. 
The value of \, for such a design is 


(65) Ny = 4s a os (> o ). 


w=l w=] 


If s = 2, then to satisfy (64) the two vectors must be of the form 


(p’ cos 39, p sin3eg) and (—p* cos 3g, —p’ sin 3¢), 


where ¢ is arbitrary. The design then consists of the vertices of a hexagon with 
\, equal to its singular value. 

If s > 2 an infinite variety of these designs, in which the sets of points have 
different radii and appropriate relative orientations, can be derived. The largest 
value of \4 is obtained when all but two of the p,, are zero. The two non-zero vec- 
tors are equal in magnitude and opposite in sign and \, = 3s. The resulting 
design then consists of the vertices of a hexagon in any orientation together with 
the remaining 3(s — 2) points at the center. 

Designs may similarly be built up from combinations of regular figures having 





MULTI-FACTOR DESIGNS 221 


2 and 4 points. As before the maximum value of \,4 is obtained by having a single 
ring of equispaced points plus points at the center. 

7.2 Rotation of Set of Points in k dimensional space. In order to investigate the 
possibilities in more than two dimensions we first consider a method for studying 
the effect on the moment matrix of rotating any k-dimensional arrangement of 
experimental points. 

Consider some set of N points in k dimensional space and as before denote by 
D the N X k design matrix, the elements of whose rows are the coordinates of 
the points. To correspond to our definition of derived power vectors suppose a 
second degree model written so that product terms such as x,7; have the co- 
efficient 1/2 attached. For example, if k = 3 


9 = Bo + B27; oe BeXe + B32%3 + Buxi + Boe x3 + B33 23 
(66) Bis 5 Bis Bes z 
= 2 11 2%2 — 2212 —= 2 L223). 
+ Av 1 ) + Fa (V2am) + Flv 3) 
As before we denote by X the N X 4$(/: + 1)(k + 2) matrix of independent 
variables co-responding to D. If the set of points is submitted to some rotation 
the new design matrix will be DH where H is k X k orthogonal matrix. The 
matrix of independent variables will be transformed to X = XG where G is a 
matrix derived from H which when partitioned after the Ist and (k + 1)th row 
and column has the form 


The partitioning will be seen to correspond to the separation of the constant 
term, first order coefficients and second order coefficients. The matrix H™ is 
the second Schlaflian matrix derived from H which may itself be conveniently 
partitioned after its kth row and column and is of the form 


aig 
(68) of <9 


vis 


This partitioning corresponds to the separation of quadratic and interaction ef- 
fects. For example, for k = 3 


hi hy 


(69) 


H = ha hee 
af hse 





222 ;. E. P. BOX AND J. S. HUNTER 


then 


2 2 2 5} 9} 
hi 2 3 2*his his 2*his his 


(70) and (71) «= | :. he Ree 2*hoy hes 


2 2 , ‘ 953 
hs 32 33 hs, hes 2 hs h33 


hi, hay 2' his hea Bhis hes 


> 


=| Mhuhar Wishes hashes 
om Dheshss 2*heshes 

hirhas + Iighar Rarhes + Rrsher Iaahes + Iashas 

(73) B= | Auhse + Aisha Auhss + Aisha Iichss + hishse 
hor hse + Aeohs, herhss + heshs: heehss + hae 


where the terms are arranged in the order shown in Eq. (66). 

The original moment matrix is N~’X’X. After rotation the moment matrix is 
Nn X’X = N’G’X’XG. If the design is rotatable then the moment matrix before 
and after rotation are equal whatever the orthogonal matrix H from which G 
is derived. As we have seen, this implies that the matrix N"'X’X must be of a 
particular form. With the present definition of the interaction variables this 
form is 


y 


N'(X’X) = 
d4(2I + 11’) 


24 1 | 


where the dots indicate null submatrices, I is the identity matrix and 1 is a 
column vector with elements all unity. The partitioning in this and all moment 
matrices that follow is after the Ist, (k + 1)th, and (2k + 1)th row and column. 
This partitioning separates the elements corresponding to the constant term, the 
first order terms, the quadratic terms and the interaction terms respectively. 

7.3 Three dimensional designs. In three dimensions sets of n points equally 
spaced on a sphere are provided by the vertices of the five regular figures. These 
are the tetrahedron (n = 4), the octahedron (n = 6), the cube (n = 8), the 
icosahedron (n = 12), and the dodecahedron (n = 20). Using the method of the 
previous section we can study the moment matrices for these sets of points sub- 
ject to an arbitrary rotation. 

For example, in the case of the tetrahedron we may suppose that initially the 
coordinates of its four vertices, i.e., the rows of D, are (—1, —1, 1), (1, —1, —1), 
(—1, 1, —1), and (1, 1, 1). We can then write down the matrix of independent 
variables X for a second order model, the moment matrix }-X’X and finally the 





MULTI-FACTOR DESIGNS 223 


corresponding matrix }-X’X = }-G’X’XG after an arbitrary rotation H has been 
applied to D. We thus obtain 


V2H'Jy | V2H’Js 


1: 2y'JH 11’ + 2y'y! 


! 


Where J is a square matrix with unit elements along the diagonal running from 
the bottom left hand corner to the top right-hand corner and zero elements else- 
where. 

In a sinular way for the other regular figures with scales adjusted so that 
p = k = 3 we consider equiradial points formed by the 


6 vertices of the octahedron (+ /3, 0, 0), (0, ++/3, 0), (0, 0, ++/3), 
8 vertices of a cube (+1, +1, +1), 
12 vertices of the icosahedron (0, ta, +b), (+b, 0, ta), (+a, 0, +b), 
20 vertices of the dodecahedron (0, +c a +c), (+c,0, +c me (+c ', +c, 0), 
(+1, +1, +1), 
where a = 1.473, b = 0.911, and c = 1.618. For the octahedron and cube the 


moment matrices after applying the general rotation are obtained by setting 
i: = 3 in the expressions: 


ke’a! ke’S 
i+! kG’a | kQ’g 
Octahedron 


and for the icosahedron and dodecahedron by setting n = 12 and 20 respectively 
in the expression: 





G. E. P. BOX AND J. S. HUNTER 


As is to be expected, the vertices of the tetrahedron, octahedron, and cube do 
not individually supply rotatable sets of order 2 although they all provide ro- 
tatable sets of order 1. The larger number of points provided by the icosahedron 
and the dodecahedron however provide rotatable sets of order 2. These have of 
course the singular value of \, = 3. 

As before therefore we may combine icosahedral and dodecahedral sets either 
with points in the center or with one another to form second order rotatable de- 
signs. These designs are built up in any one of the following ways: 

(1) the 12 vertices of an icosahedron in any orientation with n, = 1 points at 
the center, 

(2) the 20 vertices of a dodecahedron in any orientation with n,; = 1 points 
at the center, 

(3) the vertices of two concentric icosahedra of differing radii p; and p») each 
in any relative and absolute orientation, 

(4) the vertices of two concentric dodecahedra of differing radii p; and py 
each in any relative and absolute orientation, 

(5) the vertices of an icosahedron of radius p; together with the vertices of a 
dodecahedron having the same center and radius p. each in any relative and 
absolute orientation. 

The choice of m; in designs (1) and (2) and of p:/p2 in designs (3), (4), and (5) 
determines the value of \, and hence via Eq. (48) the manner in which the vari- 
ance of 7 changes with p. In particular, the value given in Table 1 for k = 3 is 
A, = 0.84. This value is most nearly obtained (Eq. 59) with n; = 5 for design 
(1), m = 8 for design (2). Also from Eq. (58), p:/p2 should be 2.11 for designs 
(3) and (4), and 2.85 or 0.530 for design (5). 

As with the two dimensional designs, sets of points not themselves rotatable 
arrangements of order two may be combined to give second order rotatable de- 
signs. Of particular importance because of the existence of parallel designs in k 
dimensions are the designs obtained by combining the vertices of a concentric 
cube and octahedron. The relative orientation of the figures is such that each 
line joining the origin to a vertex of the cube pierces the center of a face of the 
octahedron and vice versa. These designs are special cases of the central com- 
posite designs described in references [1] and [13]. 

Since H™ is orthogonal 


(79) e'8 = —y'5; yy =I1— a’a; 3’ = I — 9. 


Using these identities, it will be seen from the nature of the moment matrices 
of the octahedron and cube (76) and (77) that by suitably choosing the relative 





MULTI-FACTOR DESIGNS 225 


sizes of the figures a combined arrangement can be obtained whose moment 
matrix is of the form (74) required for a rotatable design of second order. If 
p, and p, are the radii of circumscribing spheres of the octahedron and cube 
then p./p. = 2°*/3"? = 0.9710 and \, = 0.6005. 

This value of \, is very close to the singular value of k/(k + 2) = 0.6. By add- 
ing points at the center however satisfactory values of \, may be obtained. The 
value A, = 0.84 given in Table 1 which gives the same precision at p = 0 as at 
p = 1 is most nearly obtained with 6 points at the center, and the value of 
4 = 1 required for ‘‘orthogonality” is most nearly attained with 9 points at the 
center. 

From the nature of the moment matrices in Eqs. (76), (77), and (78) it is 
seen that in general an infinity of three-dimensional second order rotatable de- 
signs can be generated by combining in various ways the vertices of octahedra, 
cubes, icosahedra and dodecahedra with or without added center points. One 
such design of considerable practical interest used by De Baun employs a cube 
with two added octahedra but no center points. 

7.4 Designs in more than 3 dimensions. In five or more dimensions there exist 
only three regular figures. These are the regular simplex (k-dimensional analogue 
of the tetrahedron having k + 1 vertices), the cross-polytope (k-dimensional 
analogue of the octahedron having 2k vertices) and the measure polytope or 
hypercube (k-dimensional analogue of the cube having 2‘ vertices). In four di- 
mensions other regular figures occur. These have 24 vertices, 120 vertices, and 600 
vertices. The figures with 120 and 600 vertices are of little interest from the 
point of view of constructing usable experimental designs and the figure with 
24 vertices may be obtained by combining the cross polytope with the hypercube. 
Our discussion will therefore be confined to designs constructed from the vertices 
of the regular simplex, the cross polytope and the hypercube. 

The regular simplex always supplies a first order rotatable design and we shall 
show that the cross polytope and hypercube can always be combined to give an 
arrangement from which a second order rotatable design may be obtained. 

If we suppose the cross polytope and the hypercube each to have radius 
i"? and to be in ‘standard orientation’ so that the 2k vertices of the cross polytope 
have coordinates 


(+k'?, 0,0, --- , 0) (0, ek”, 0, --- , 0) ---, (0,0,0, ---, +k”) 


and the 2‘ vertices of the cube have coordinates (+1, +1, --- , +1) then the 
moment matrices, after applying the same rotation H, are those given in Eqs. 
(76) and (77). 

By combining the vertices of the cross polytope of radius p, with those of the 
measure polytope of radius p, in the relative orientation indicated so that 
2k*ps = 2**'p!, that is so that p./p. = 2“*/k"*; an arrangement with the de- 
sired moment matrix (74) results. 

These rotatable arrangements when combined together or augmented with 
suitable numbers of points at the center provide second order rotatable designs 
of great practical value. In their “standard orientation”’ the resulting designs are 





226 G. E. P. BOX AND J. S. HUNTER 


particular examples of the composite designs discussed in references [1] and 
[13] and consequently lend themselves very conveniently to sequential experi- 
mentation. As is discussed more fully in [1] they may be built up in parts each of 
which supplies valuable interim information. They are particularly simple to use. 
In standard orientation the part of the design corresponding to the measure poly- 
tope or hypercube defines a set of experimental points which follow the familiar 
2" factorial design. To form the second order rotatable designs these are aug- 
mented with points at the center and with points corresponding to the cross 
polytope in which all the variables except one are held in turn at the ‘center’ 
levels, the remaining variable being maintained first at a level above its center 
value and then at a level below its center value. Because in standard orientation 
the latter points lie along the coordinate axes they may be referred to as ‘‘axial 
points’’. 

When k is sufficiently large a suitable fractional replicate can replace the full 
hypercube. Since a second order moment matrix identical with that of the full 
factorial will be obtained with any fractional replicate of the 2 design in which 
no effects of second order or lower order are confounded. The only result of this 
substitution will be to effect the alias matrix of the design. 

For a k dimensional design with n. = 2k axial points, n, = 2“~” points in the 
(4)? replicate of the 2‘ factorial, and no points at the center, p./p. = ni */k*” 
and 


(g1) N/{n. + 4(1 + né”)}, 

where N = a + Ne + M%. 
‘ When using these designs in their standard orientation it is simplest to regard 
them as scaled so that the hypercube or fractional hypercube has its coordinates 
equal to plus or minus unity. The NV coordinates of the complete design are then 


n, points (2" factorial or suitable fraction): (+1, +1, --- , +1), 
N_ points (axial points): (+a, 0, 0, --- , 0), (0, ta, 0, --- , 0), ---, (0, 0, 
*', ae), 


n points (center conditions): (0, 0, 0, --- , 0). 


Then a = n}" and the scale factor c of Eq. (3) is N/(m.e + 2ni'*). The values 
of n-, Na, No, &, pa/pe and dA, for second order rotatable designs, which give the 
values of \, set out in Table 1 and for orthogonal designs, are given in the Table 3 
fork = 2,3, 4, --- , 8 dimensions. 


8. Arrangement of the designs in blocks. To avoid bias due to systematic 
disturbances the complete set of experimental trials corresponding to the points 
which form the design could be performed in random order. Frequently how- 
ever it is possible to carry out limited groups of trials under more homogeneous 
conditions than can be attained for the complete set. It may then be possible to 
achieve greater accuracy by performing the designs in blocks, carrying out the 
individual trials within each block in random order. A block may for example 
refer to a group of experiments performed on the same day, or a group of experi- 
ments for which it was possible to use the same batch of starting material. 





M 
Z 
o 
RH 
a) 
a 
os 
9 
= 
vo 
< 
re 
= 
4 
5 
= 


000°T 


(daa })g 





68T'T 


£60) 


b9E's 


(daz §)g 


122° 1 


| 
\00" | 


| 
i1O I 


26°0 06 ; 
' | I 


000° F 


pore | gze'z 


j 
vee, 008} OOT 


| 2 | 


£91! 6 





| $I 





bT 


oe 











(daz §)9 9 





peso | 


4 | 
i 
> 
oso 
000° | 
| 9g | 


9I 





(daa §)¢ 





subisap .apso puovas 97qQ070}04 


£ 


ATAVAL 


10 I 


000°T | 


690 'T 146°0 


| 
66 0) 


| | 
98°) 


2 
[68° 98 "0! 
82e°% | 000'% | Z8O°L | PIRI 


| 
& 


arsod woo yw4}Ua) 


—|- 
jE8 0 


[vuosoy YG 


[ e981 


iu 
[¥903 N 


({BuoFoyyIGg) °u 


(T aquy) eu 


"u 





28 G. E. P. BOX AND J. S. HUNTER 


We shall show how the designs we have discussed may be performed in or- 
thogonal blocking arrangements. On the usual assumption, that the effect of 
carrying out a particular trial in one block rather than another is merely to 
change the expected value of the response by a fixed amount which depends only 
on the particular blocks involved, such arrangements insure that the estimated 
coefficients of the polynomial are completely independent of the block differences 
and their standard errors depend on the within-block variance only. 

For N experimental points assigned to m blocks with n:, points in the wth 
block we suppose that 


m 


n k k 
Mu = 2d Bow tun + 2d Bilin + 2 >, Bis Liu Liu; 
wo i= ial jmi 

where 6». is the expected value of the response in the wth block at the experi- 
mental conditions corresponding to the origin of the design, and z,,,, is a “dummy” 
variable taking the value unity for those experimental points which fall in the 
wth block and zero for all other experimental points. We shall call x, , x,; , etc., 
the polynomial variables and z, , z» , etc., the block variables. 

In whatever manner the experimental points are assigned to the blocks we can 
except in the “‘pathological” cases detailed below estimate the coefficients of the 
polynomial equation allowing for the block effects, by the method of least squares. 
However the manner in which the experimental points are assigned to the blocks 
profoundly affects the efficiency of estimation and the ease with which the esti- 
mates are calculated. In particular, if it is possible so to allocate the experimental 
points to the blocks that the block variables and polynomial variables are 
orthogonal, then the inclusion of blocks does not at all influence the estimation 
of the polynomial coefficients and the only effect of blocking is the wholly de- 
sirable one of limiting the experimental error to that occurring within blocks. The 
analysis of the experiment proceeds exactly as if there were no blocking, except 
that in the estimation of the residual error the contribution due to blocks is sub- 
tracted from the residual sum of squares. 

In order to find the conditions that must be satisfied to allow orthogonal block- 
ing it is simplest to rewrite the model in the equivalent form 


k k - m 
Ts > Bo + > Bi rin + pa Bij Lie Biju + po 5..(2 "Be ‘3. 


i=l i=l jai w=] 


m 


Nip 
B= >. . Ber, tt @feo— he, 
wel i 


i. ’ . . “or — 

We note that z,.. — Z,. is equal to 1 — n»/N when the uth set of conditions is in 
/ , . 

the wth block, and to —n,,/N otherwise. 


The conditions for orthogonal blocks, that is to say the conditions that the 


block variables z,,,, -- 2, shall be orthogonal to the variables 


2 2 
Te, %1,%e,°** Te, MW1%2, 


in the second degree polynomial, can be written 





MULTI-FACTOR DESIGNS 


N 


(83) >, Lin Zju(Zeu — 2) = O, 


u=1 


that is 


N N 
(84) > Sin Tinton ™ By > Lig Line 


u=l 


, . . ‘ ae : N 
Now for any second order rotatable design, if 7 + j, ao 1 Link j, = 0, whence 


for orthogonal blocking we require 


nw 
2) a LiunLju = O, 1#j,w =1,2,--- m, 


where the summation includes only those values of u in the wth block. Thus (1) 
all the sums of products between x, X11, «++ , X» must be zero for each block. 

A second condition arises from putting 7 = 7 in (84) whence 

nwo 

2D, Liv 
(86) > = 

u=l 
where the summation in the numerator again is for those values of u in the wth 
block. Hence (2) the fraction of the total sum of squares for each variable contributed 
by each block must be proportional to the number of observations in each block. 

8.1 Examples of orthogonal blocking with designs based on equiradial sets of 
points. Where the rotatable design consists of an equiradial set of points with 
added points at the center, we can satisfy both conditions (85) and (86) if we can 
divide the equiradial set into subsets which are themselves first order rotatable 
(i.e., first order orthogonal) designs. When this is possible the sum of squares for 
each variable in each subset will be proportional to the number of points in the 
subset, and we have only to add a number of center points to each subset propor- 
tional to the number of points it already possesses to obtain a orthogonal block. 

For example, consider a two dimensional “hexagonal’’ design consisting of 6 
points at the vertices of a hexagon together with 2p center points. We can per- 
form this design in two orthogonal blocks each consisting of three points at the 
vertices of an equilateral triangle and the remaining p points at the center. Sim- 
ilarly an octagonal design (the two dimensional rotatable composite design of 
Table 3) can be divided into two sets of four points at the vertices of a square 
with equal number of center points added to each to form two blocks. A “‘nonag- 
onal’’ design may be divided into three equilateral triangles, which form the 
basis of three blocks and so on. 

In three dimensions the vertices of the dodecahedron may be divided into five 
sub-sets each of which comprise the vertices of a tetrahedron. Thus for example a 
design consisting of twenty points at the vertices of the dodecahedron plus 5p 
points at the center is divisable into five orthogonal blocks of 4 + p points. Each 
block consists of a complete tetrahedron plus p center points. 





230 G. E. P. BOX AND J. 8S. HUNTER 


8.2 Blocking with composite rotatable designs. The important composite rotatable 
designs lend themselves very conveniently to blocking and some valuable work 
on this topic has been carried out independently by De Baun [15]. Because the 
number of center points in any block must be an integer, exact rotatability and 
exact orthogonality between quadratic variables and block variables is not always 
attainable. We can however insure that either one of these desiderata is exactly 
satisfied and the other one nearly so. Although the extra labor involved in the 
calculations due to the slight non-orthogonality is not very great and the loss of 
efficiency is negligible, it is simplest in practice to use designs in which the block 
effects are exactly orthogonal but the condition of rotatability is slightly relaxed. 

The central composite design in standard orientation consists of n, points at the 
vertices of a cube corresponding to a 2" factorial arrangement or some suitable 
fraction of it with coordinates (+1, +1, ---, +1), together with n,= 2‘ 
“axial” points with coordinates (+a, 0, --- ,0), (0, ta, --- ,0), ---, (0,0, 
-++ | +a), and np points at the center with coordinates (0, 0, --- , 0). Ifa = nt" 
the design is rotatable, but let us for the moment not assume that this is so. 

The sets of points at the vertices of the cube and the set of the axial points are 
each first order rotatable designs. These two parts of the design thus provide a 
basis for a first division of the composite design into two blocks. The blocking 
will be orthogonal if it is possible to allocate the center points to the two parts so 
that the total number of points in each part is proportional to the sum of squares 
for each variable contributed by that part. If no and naw are the numbers of 
center points in the cubic part and the axial part respectively then we require 
(87) Qa" _ a + Meo | 


; 
Ne Ne + Neo 


thus for any composite design with 


(88) a = {Mela + Meo)" 


\ 2(me + Ne) | 


we obtain orthogonal blocking. Now for rotatability a = n.'* and hence to 
achieve both orthogonal blocking and rotatability we require that 


(89) 1 A mT Me + Neo 
" Na + Nao 


The set of axial points cannot be further sub-divided into sets which are first 
order rotatable designs. Such sub-division is possible however for the set of 
points at the vertices of the cube provided a system of confounding for the two- 
level factorial or fractional factorial design exists such that all the comparisons 
confounded correspond to interactions between three or more variables. If this 
is so the comparisons confounded will be unassociated with the comparisons used 
to estimate the coefficients of the polynomial. Also, since the comparisons con- 
founded are the defining contrasts of the sub-sets regarded as fractional factorials 
and correspond to interactions between three or more factors, it follows that the 





MULTI-FACTOR DESIGNS 231 


sub-sets are individually first order rotatable designs. If the cube is divided up 


into sub-sets each containing the same number of points then in accordance with 
Eq. (86) we must add an equal number of center points to each sub-set to main- 
tain orthogonality. 

8.3 Examples. 

(i) We first consider the four-dimensional design to illustrate the situation 
where both rotatability and orthogonality of blocking may be attained. From 
Table 4 it is seen that in standard orientation this design consists of the 2‘ fac- 
torial, with coordinates (+1, +1, +1, +1), 8 axial points at distance a = 
16'* = 2 units from the center, and to approximately satisfy the requirement of 
Table 1 seven points added at the center. 

We can achieve orthogonal blocking and rotatability if we can satisfy Eq. 
(89) which gives 


(90) 9 = 16 + Na 
; 8 + Nad ; 


Now the total number of points at the center np = no» + Moo is not critical and if, 
for example we use six points at the center instead of seven the only effect will 
be to slightly change the variance function and in particular to decrease slightly 
the precision near the origin. If we now allocate n» = 4 points to the cube and 
Nao = 2 points to the axial part, we satisfy (90). In this way the complete set of 
16 + 8 + 6 = 30 points is divided into two orthogonal blocks, one containing 
16 + 4 = 20 points and one with 8 + 2 = 10 points. Now the 2‘ factorial de- 
signs corresponding to the cube may be further divided into two halves each of 
which is a rotatable design of order 1. This may be done without affecting the 
estimation of the polynomial coefficients by arranging that the contrast between 
the two halves corresponds to a three or four-factor interaction. The most suit- 
able contrast is the four-factor interaction and to effect the division we allocate 
those points in the design for which the product 2;7273;2, is 1 to one part and those 
for which it is —1 to the other. We may in accordance with Eq. (86) maintain 
orthogonal blocking by dividing the four center points assigned to the cube equally 
between the parts, two to each half. Finally therefore the design of 30 points is 
carried out in 3 blocks each of ten points consisting of the axial points together 
with two center points and the two half-replicates of the cube each with two cen- 
ter points. The blocking is completely orthogonal and the design exactly ro- 
tatable. It should be noted that since the separate blocks are themselves first 
order rotatable designs, this scheme ensures orthogonal blocking not only in the 
standard orientation of the design which we have specifically discussed but also 
in every other orientation. 

(ii) To illustrate the situation where orthogonality of blocking and rotatability 
are not exactly attainable simultaneously, consider the three-dimensional com- 
posite design. From Table 3 we see that, in standard orientation, the design con- 
sists of the 2° factorial with coordinates (+1, +1, +1), the six axial points each 
at a distance 8'* = 1.6818 and about six center points are needed to satisfy ap- 





232 G. E. P. BOX AND J. S. HUNTER 


proximately the requirements of Table 1. From (89), to achieve orthogonal block- 


ing and rotatability we require 


1/2 


(91) Ss Ss -# Theo 


2 6 + Nad : 


where no + Nao is about 6. We come nearest to satisfying this requirement by 


putting no = 4 and nw = 2; however, since the equation cannot be exactly 
satisfied with integral values, some slight nonorthogonality must occur. This 
non-orthogonality would be the same in every orientation of the design and 
the estimates of the coefficients corrected for block effects would in fact be very 
easily obtained without much extra labor. However since exact rotatability is not 
required in practice we choose therefore to adjust the value of a to obtain 
orthogonality at the expense of rotatability. From Eg. (88) it is seen that for 
Neo = 4, Na = 2 and for orthogonal blocking we require 
, \ 1/2 
(92) == arty > = 1.6330 
(412) 


instead of a = 1.6818 required for rotatability. For this value of a the variance 
contours will not be exact spheres, the difference from sphericity will however be 
so slight as to be of no practical importance. Since the sub-groups are first order 
rotatable designs the blocking will remain orthogonal in every orientation. The 
elements corresponding to the constant terms and second order terms in both the 
moment matrix and the precision matrix will change slightly as the design is 
rotated however. 

As before the cube part may be divided into two portions. These are the two 
fractional replicates whose defining contrast is the three-factor interaction. Since 
four center points are allocated to the cube we can divide these equally between 
the fractions and so maintain orthogonality. 

Finally therefore the 20 points of the design with a = 1.6330 may be divided 
into three orthogonal blocks. One block consists of eight points containing the 
six axial points and two of the center points, and other two blocks each contain a 
half replicate of the cube together with two center points. A list of orthogonal 
blocking arrangements for rotatable and near-rotatable composite designs is 
given in Table 4 below. 

An aspect which makes these blocking arrangements particularly useful arises 
out of the nature of the situation in which these designs are often used. In the 
exploration of response surfaces [1], [13] trials are usually performed sequentially 
and often have as their object the increase or maximization of some response. It 
has been shown that each block of the second order design is itself a first order 
rotatable design with added points at the center. Such a design allows estimates 
to be obtained not only of first order effects but also (assuming a second order 
equation is adequate to represent the surface) of the swm of the quadratic effects. 
For if % is the mean of the observations of the center and jz the mean of the ob- 
servations in the first order design then using (8a) it will be found that for any 





MULTI-FACTOR DESIGNS 


TABLE 4 


Blocking errengenents for rotatable and near-rotatable central com peau design 


k Aion PY ne oe ee sca re rep) 6 lec ; rep)| 7 7(4 rep 
i | 


| 


ne : Number of points in | 16 
cube 
Number of blocks in cube 
Number of points in block | 
from cube 
Number of added center | 
points 
Total number of points in 
block 
Axial block 
na Number of axial points | 
Number of added points 
Total number of points in 
block 


| 

: } 

Blocks within cube | 
| 

j 

| 

| 





Grand total of points inthe | 14 | 20 30 54 33 90 54 | 169 
design 





Value of a@ for orthogonal | |1.4142/1.6330/2. a 3664/2 0000/2. — 3664/3. 363612. 8284 
| 


blocking 


Value of a for rotatability (1. 4142)1. 6818/2. 0000)2. |2.37842.0 0000)2. 8284/2. 3784! 3. 3333}2. 8284 


* A more ec onomical de sign for k = 2 (which is not however a composite design as here 
defined) is that mentioned in Section 7.1, in which 6 points at the vertices of a hexagon are 
divided into two sets of three points. To attain the value of \y given in Table 1 two points 
are added to each set to form two blocks. 


first order rotatable design in any orientation (that is, for any orthogonal first 
order design), 


k 
69a — Ho) = 2 Bus. 

In the neighborhood of a true maximum ja — jo is not small relative to the 
linear coefficients b; and this provides an additional indication of the inadequacy 
of a linear model. 

For example suppose that four variables were studied with the intention of find- 
ing an optimum set of conditions using the methods of Box and Wilson [1}. A 
half replicate of the 2° design with two center points could first be run in random 
order. This design would supply estimates of the four first order terms 3; , 
B2, 8;, and 8, and combinations of the interaction terms namely (812 + 6s,), 
(Biz + Bax), (Bia + Bex) and of the sum of the four quadratic terms ye 8; . If 





234 G. E. P. BOX AND J. S. HUNTER 


first order terms were dominant, progress could probably be made without a 
more elaborate design and moves in the indicated direction of increasing yield 
would be made until improvement in that direction was exhausted. At this point 
a new first order design would be begun at the improved set of conditions. 

If the first order terms were not dominant, or if more precision were needed, 
or a tentative move had not met with success, the second half of the factorial 
design together with two further center points could be carried out, and the situa- 
tion again reviewed in the light of the set of 20 trials now available. Finally, if 
the evidence indicated that further progress could be attained only by fitting the 
second degree equation, the third block of ten experiments consisting of the 
axial points and two center points would be added. The complete set of 30 
trials could then be used to provide efficient estimates of the best fitting second 
degree equation and further progress would follow. 


9. Details of calculations using the designs. From the observations generated 
by the second-order designs least squares estimates of the coefficients of the fitted 
polynomial, together with their variances and the appropriate analysis of vari- 
ance table, are readily computed. 
9.1 Estimates of the coefficients and of their standard errors. To estimate the co- 
efficients we require only the sums of products of the observations with the inde- 
pendent variables. We shall use the notation 
N N N 
Le tuyy = {Oy}, Btuy = fiy}, Wrutwyw = fij ys, 
u=1] “= u=] 

where 7,j = 1,2, ---,k. 


(i) Rotatable designs. The form of the inverse matrix for any rotatable design is 
given by (44) whence we have for any k-dimensional second order rotatable design 
with parameter ), 


(93) bo = AN [2ni(k + 2){0 y} — 2huc Dd iar fii y}], 
(94) »b; cN "fi y}, 
(95) = AcN"[el[(k + 2)dy — k]- {ity} 


+ c(1 — Aa) Dejan [FJ y} — 2ul0 y}}], 


(96) = ¢N Nv Sij y}, 

where 

(97) A = [2n4{(k + 2), — ky” 
and the scale factor 


(98) c= N/a tis. 


Again using (44) the variances of the estimates are 


) 


(99) Vibe) = 2AX(k + 2)0°/N 


> 





MULTI-FACTOR DESIGNS 


(100) V(b;) = co’ /N, 
(101) Vibis) = Af(k + Ie — (k — 1)}eo*/N, 
(102) V(bi3) = a"? /NX. 


These formulae apply to any rotatable design. For the particular case of the 
rotatable central composite designs scaled so that in standard orientation the 
coordinates of the n. points forming the two-level factorial or fractorial factorial 
part are (+1, +1, --- , +1) the value of the scale factor c is N/(n. + 2n2’). 

(ii) Central composite designs not necessarily rotatable. In order to obtain ex- 
actly orthogonal blocking for the central composite designs, we have proposed 
certain arrangements which are not exactly rotatable. For any central composite 
design the estimates of the coefficients and their variances are readily computed 
and these are of course unaffected by orthogonal blocking. 

The non-diagonal elements of the moment matrix for any central composite 
design in standard orientation are zero apart from terms arising from cross 
products between the “constant term” variable x) and the quadratic variables 
xi(i = 1, 2,---, k). The sub-matrix of the moment matrix corresponding to 
these variables is of the form 


egggaqges: f 


The reciprocal of this matrix is of the same form. Denoting its elements in 
corresponding positions by capital letters we find 


(104) = H'{f + (k — 1)g\(f — 9), 
(105) 1 = —He(f — g), 
(106) ’ = H {df + (k — 2) dg — (k — l)e’}, 
(107) H(e’ — dg), 
(108) (f — g){df + (k — 1) dg — ke*}. 
For the composite designs in conventional scaling 
(109) d=N, e=n+2a, f=n+2a, g=n. 


Using (109) with (104), (105), (106), (107), and (108) the estimated coefficients 
are readily calculated from the formulae 


(110) b = Di0Oy} + ED {ity}, 





36 G. E. P. BOX AND J. S. HUNTER 


(111) bi = (me + 2a°) fi y}, 

(112) bi = E{Oy} + Flity} + Diy; ii y), 
(113) bij = ne {ij y}. 

The standard errors of these estimates are 

(114) Vi(bo) = Do’, 

(115) V(bi) = (nm. + 2a°)'o", 

(116) V(bis) = Fo’, 

(117) V(bi;) = neo’. 


In practice we normally estimate o from the data in the manner described 
below. 

9.2 Analysis of variance. The total sum of squares >>*_, yi, = S having N 
degrees of freedom may be split into two parts: (i) The sum of squares Sez 
attributable to the fitted second degree equation having (4})(k + 2)(k + 1) 
degrees of freedom. This is given by 


k k k 
(118) Saz = bofO0y} + DL bdty} + DD dbasli zy}. 


t=] i=] juei 


(ii) the ‘overall’ residual sum of squares R having N — (3$)(k + 2)(k + 1) 
degrees of freedom which is obtained by difference 


(119) ee. 


Each of these sums of squares may be further subdivided. 

9.21 Analysis of sum of squares due to regression. The sum So. may be divided 
into three parts: 

(i) The sum S, , having one degree of freedom attributable to the fitting of a 
polynomial of zero-th degree (the so-called “correction due to the mean’’) 


(120) So = {Oy}?/N. 


(ii) The sum S;.o having k degrees of freedom. This is the ‘‘extra’”’ sum of 
squares associated with the fitting of a first degree polynomial. Since, for the 
designs we have considered, ho is uncorrelated with any of the b; 


k 
(121) Sio = > bifi y}. 


i=l 


(iii) The sum S».19 having }k(k + 1) degrees of freedom, the “‘extra’”’ sum of 
squares associated with the fitting of a second degree polynomial. Since for the 
designs we consider the b,’s are uncorrelated with bo and with the b,,’s 


— 
(122) Sow = bofOy} + = > biltzjy} — {0 hi? /N, 


i=] j=i 





MULTI-FACTOR DESIGNS 237 


In specific examples other sub-divisions may be relevant. For instance, it 
might be suspected that a particular variable x; had no effect at all on the result. 
In this case it would be appropriate to isolate a sum of squares S; associated with 
the variable zx; . This could be done most conveniently by carrying out an analy- 
sis omitting x; and x; (j = 1, 2,---, k) from the model. The sum of squares 
associated with the reduced model would then be subtracted from the sum of 
squares associated with the full model to give S;. 

9.22 Analysis of residual sum of squares. Where blocking has not been used 
the overall residual sum of squares R may be analysed into two parts: 

(i) The sum 


Sz: = a (yuo — Go)’, 


where yo is the uth repeated observation at the center of the design and jo 
is the mean of the observations at the center. This sum of squares has m — 1 
degrees of freedom and S,/(no — 1) provides an estimate of the experimental 
error variance o on the assumption that this variance is independent of the 
levels of the variables z; . 

(ii) R — Sy having N — (3)(k + 2)(k + 1) — m + 1 degrees of freedom, 
the residual sum of squares which measures experimental error together with 
lack of fit of the assumed form of equation. When the corresponding mean 
square is large compared with the estimate of pure error obtained in (i) above, 
this implies that the assumed form of the equation is inadequate. A full discus- 
sion will be found in [16] and will be published elsewhere. 

9.23 Analysis when blocking is employed. When blocking is employed a fur- 
ther sum of squares S, due to blocks is extracted so that the residual sum of 
squares R is now divided into three parts: 

(i) Sz having B — 1 degrees of freedom 


B 
(124) Ss = 2 n(Go — 9)’, 


where B is the number of blocks, is the number of observations in the bth 
block and % is the mean of the observations in the bth block. 

(ii) The sum of squares Sg corresponding to pure error and having m — B 
degrees of freedom. This is the sum of the individual sums of squares for re- 
peated observations at the center of each block 

(iii) R — Ss — Sg the residual which measures experimental error plus 
lack of fit. 

The designs discussed are extremely convenient to use. The points at the 
center of the design allow a check to be made at a standard set of conditions 
while the experiment is being carried out and so helps to show up gross errors. 
Furthermore the center points provide an estimate of pure error from which it is 
possible to check the adequacy of the assumed form of equation without repli- 
cating the whole design. 





238 G. E. P. BOX AND J. S. HUNTER 


10. Simplification of confidence region for maximum. On the assumption 
that a second degree equation can adequately represent a response surface in 
the region of interest, a confidence region for the stationary point of this surface 
has been derived in [17]. Unfortunately the boundary of the confidence region 
is, in general, not easy to compute but Wallace [20] has devised valuable ap- 
proximations which are easy to compute and to appreciate. A very considerable 
simplification in the expression for the exact confidence interval occurs when a 
rotatable design is used to generate the experimental data. 

Suppose the second degree equation in k variables m,---, %i,°--:, 2, 

- , x, which has been fitted by least squares is written in the form 


(125) Y — Qo = X’a + (})x’Ax, 

where @ is the k X 1 vector {aj} and Ais the k X k matrix {a;;} and in terms 

of the notation previously adopted ao = bo, ai = bi, dau = Du, asz = Di; . 
The position of the center of the fitted system is obtained by equating to zero 

the elements of the k X 1 vector 5 defined by 

(126) 6 = a + Ax. 

Thus if x is the vector of the k coordinates of this center then 

(127) a = —AX, 

at which point the value of y is given by the equation 

(128) Yo = doo + (3)xoa . 

The confidence region given in [17] is defined by the inequality: 


(129) 3’V "3 < skF.(k, ¢), 


where Vo" = &(8’8), s’ is an estimate of variance having ¢ degrees of freedom 
and distributed as x°o"/g independently of 6, and F.(k, ¢) is the a probability 
point of the F distribution having k and ¢ degrees of freedom. 

For a rotatable design using (100, 101, and 102), the variances and covariances 
of the é’s are given by 
(130) V(8,) = No'(e + NX 7e'p*? + bc*x?), Cov (8,6;) = No leraz; , 
where 

. 

131) = Ly GD qr teggeeesiggs : 
( 2 M+ QA — FI 


c is the scale factor for the design defined in (3), and \ = 4. Thus 


same’ fc ) 
39) N — 1 2 ee. "\ 
(132) NV = c(l + X*e6') 1 + ag 


The matrix V is readily inverted using a theorem due to K. D. Tocher [18] to give 
N ( 4 ) 


/ 


2 on <n snisieiatiiieiedasainil 
(133) V c(l + dep?) " i+ oO” + dep 





MULTI-FACTOR DESIGNS 


Thus 


(134) iV 5 = es 18% ceils x) . 


c(1 + A“ep*) ~ T+ 0's Oc 


Now using (126) with (127) 


35) & = A(x — xX). 


It follows that when the data have been generated by a rotatable design the 
confidence region for the stationary point is given by the expression 


N La 2 f J 
ee te. ENS) a ea 
136) @+¥ wl! ee tote ee | 

< ks°F(k, ¢). 


Now a fitted second degree equation can be interpreted most readily by writing 
it in the canonical form 


k 
137) Y— Y= me B;;Xj. 
t=] 


The k elements B;; are the latent roots of the matrix (3)A. If a diagonal matrix 
B is formed with the B;, for its elements then the latent vectors of A form the 
k rows of an orthogonal matrix M such that 


138) (3)MA = BM 


and the matrix X of ‘‘canonical variables’? whose elements are X, , X:, 
X, is defined by 


139) X = M(x — x) 


The coordinates of the initial origin x = 0 in terms of the canonical variables 
is given by 


140) X, = —Mx, 


whence x = M’(X — Xs). Thus the expression defining the confidence region 
reduces to the relatively simple expression 


ww Tid. Y G I B.XAX, — X ] 
in TES Lb >< Sere He Bat 24 BuXdXs — Xa) 


< ks*F.(k, ¢), 


where 


k 
2 ’ r , 
p = x'x = (X — X,)’MM'(X’ — X) = Do (Xi — Xu)’. 
i=1 
A rough delineation of the confidence region in a readily appreciated form can 
now be obtained by enumerating the points at which it cuts the canonical axes. 





240 G. E. P. BOX AND J. 8S. HUNTER 


By direct substitution in (141) the point (0, --- ,0, X;, 0 --- 0) will be included 
in the confidence region provided that 


f/f ke 5 
Bi, < cks’F.(k, ¢) (1+ Ne » Xjo + XN (Xi — Xio)*) 
\ 


ji } 


{1+ (7 + Oe DY Xjo + (X* + Oe(X; — Xa) 
x . Jt } 


: ; <. 

4ANX3\1 + (Xt + Oe Do Xho + N*e(Xs — Xi)’ 
\ isi } 

If the quantities Xio , Xo, --- , Xxo are finite then as X; tends to infinity this 

becomes 


(143) Bi; < (4N)7kF a(k, g){A + f}e’s", 


which is the condition that the confidence interval includes points at + on 
the canonical axis X; 

10.1 Redundancy of canonical variables. This result can be viewed from a 
somewhat different angle. In real problems one would expect (see for example 
reference [3]) that the underlying system would often be approximated by sur- 
faces containing a stationary line, plane or hyperplane rather than a stationary 
point. When this occurred one or more of the B;; calculated from the fitted equa- 
tion would be close to zero and the corresponding canonical axes would delineate 
the stationary line, plane or space. Analysis of this sort is of considerable prac- 
tical importance since it may help in the deduction of the underlying mecha- 
nism of the system [19]. 

An important question that normally arises is how small the canonical units 
B,; must be before we may conclude that zero values are not inconsistent with 
the data. Now for any rotatable design the variances of the coefficients are the 
same in every orientation and since the B;; are simply the ‘‘quadratic effects” 
in the directions of the canonical variables they have the same standard errors 
as have the quadratic effects b;; before transformation. Were it not that the small 
values of the B;; will be selected because they are small we might “test the sig- 
nificance”’ of the B;; by dividing by their standard errors and referring the quo- 
tient to tables of the ¢ distribution. Now the estimated variance of the quadratic 
coefficients selected from a rotatable design with scale factor c is (4N)™ 
(A + £)c’s* and consequently the inequality (143) can be written 


Bi; 
s.e. (Bi) 
implying that we can “test the significance”’ of the B;; computed from a rotatable 
design by referring not to the ¢ distribution but to the distribution of kF .(k, ¢). 
This is a special case of a more general result derived by Wallace [20]. 


| ES 
< VkF.(k, ¢) 


REFERENCES 


{1] G. E. P. Box anv K. B. Witson, “On the experimental attainment of optimum condi- 
tions,’’ J. Roy. Stat. Soc., Ser. B, Vol. 13 (1951), pp. 1-45. 





MULTI-FACTOR DESIGNS 241 


[2] G. E. P. Box, ‘‘Multifactor designs of first order,’’ Biometrika, Vol. 39 (1952), pp. 49-57. 
(3) 


D. J. Finney, ‘The fractional replication of factorial experiments,’’ Ann. Eugenics, 
Vol. 12 (1945), pp. 291-301. 
. Hore iina, “Some improvements in weighing and other experimental techniques,” 
Ann. Math. Stat., Vol. 15 (1943), pp. 297-306. 
,. PLacKeTT AND J. P. Burman, ‘“‘The design of multifactorial experiments,’’ Bio- 
metrika, Vol. 33 (1946), pp. 305-325. 
. Tocuer, ‘‘A note on the design problem,’’ Biometrika, Vol. 39 (1952), pp. 109- 
117. 
. ArTKEN, Determinants and Matrices, Oliver & Boyd, London, 1948. 
. AITKEN, ‘‘On the Wishart distribution in statistics,’’ Biometrika, Vol. 36 (1949), 
pp. 59-62. 
(9] J. H. M. Weppersurn, Lectures on Matrices, Amer. Math. Soc., Coloquium Publica- 
tion, Vol. 17 (1934). 
[10] G. E. P. Box, ‘‘Spherical distributions (preliminary report),’”’ Ann. Math. Stat., Vol. 
24 (1953), pp. 687-688. 
{11] J. CLerxk MaxweE t, ‘Illustrations of the dynamical theory of gases. Part 1: On the 
motion and collisions of perfectly elastic spheres,’’ Philos. Mag., Vol. 19 (1860), 
pp. 19-32. 
[12] M. S. Bartuert, ‘“The vector representation of a sample,’’ Proc. Cambridge Philos. 
Soc., Vol. 30 (1934), pp. 327-340. 
(13] G. E. P. Box, ‘The exploration and exploitation of response surfaces: Some general 
considerations and examples’’ Biometrics, Vol. 10 (1954), pp. 16-60. 
(14] G. E. P. Box anv J. 8S. Hunrer, “The exploration and exploitation of response sur- 
faces: Some useful designs,’ in preparation; to be submitted to Biometrics. 
[15] R. M. DeBaun, “Block effects in the determination of optimum conditions,’’ Bio- 
metrics, Vol. 12 (1956), pp. 20-22. 
[16] G. E. P. Box, J.S. Hunter anv R. J. Haver, ‘‘The effect of inadequate models in sur- 
face fitting,’’ Technical Report #65, Institute of Statistics, Raleigh, N. C. 
[17] G. E. P. Box anv J. 8. Hunter, “A confidence region for the solution of a set of simul- 
taneous equations with an application to experimental design,’’ Biometrika, Vol. 
41 (1954), pp. 190-199. 
{18] K. D. Tocuer, Discussion to the paper “On the experimental attainment of optimum 
conditions,’’ by Box and Wilson, J. Roy. Stat. Soc., Ser. B, Vol. 13 (1951), pp. 
39-42. 
{19} G. E. P. Box anv P. V. Yous, ‘“‘The exploration and exploitation of response sur- 
faces: An example of the link between the fitted surface and the basic mechanism 
of the system,’’ Biometrics, Vol. 11 (1955), pp. 287-323. 
{20} D. Watuace, “Intersection region confidence procedures with an application to the 
location of the maximum in quadratic regression,”’ submitted for publication. 





NOTES 


CONSISTENCY OF CERTAIN TWO-SAMPLE TESTS 


By J. R. Buum' anp LioneLt Wetss? 
Indiana University and University of Oregon 


1. Introduction and Summary. Let X,,--- , Xm; Y1,-°-+, Y, be independ- 
ently distributed on the unit interval. Assume that the X’s are uniformly dis- 
tributed and that the Y’s have an absolutely continuous distribution whose 
density g(y) is bounded and has at most finitely many discontinuities. Let Z 0, 
Znii = 1, and let Z,; < --- < Z, be the values of the Y’s arranged in increasing 
order. For each 7 = 1, --- , m + 1 let S; be the number of X’s which lie in the 
interval [Z;_, , Z;]. 

For each nonnegative integer r, let Q,(r) be the proportion of values among 
S:,---, Say: which are equal to r. Suppose m and n approach infinity in the 

ratio (m/n) = a > 0. In Section 2 it is shown that 
lim sup /Q,(r) — Q(r)| = 


no r>0 
with probability one, where 


g (y) 
——ereene, CY. 
la + g(y)]"* 

This result may be used to prove consistency of certain tests of the hypothesis 
that the two samples have the same continuous distribution. Several such exam- 
ples are given in Section 3. A further property of one of these tests is briefly 
discussed in Section 4. 


Q(r) =a’ [ 


2. The convergence theorem. With Q,(r) and Q(r) defined as in the previous 
section we have the 
THEOREM. 
P{ lim sup Q.(r) — Q(r)| = 0} 
n—-2o r>0 
Proor. We shall first prove that lim,.. Q,(r) = Q(r) with probability one, 
where r is any positive integer. The proof for r = 0 is entirely analogous. To this 
end let W; = Z; — Zir,i = 1,-+--,n + 1 and define V,(r) by 
" (1. if S; =f i 
V(r) = : forz = 1,---,n+1. 
0, otherwise 
Received November 4, 1955. 
1 Research supported in part by the United States Air Force under Contract 


No. AF18(600)-685 monitored by the Office of Scientific Research. 
? Research sponsored by the Office of Naval Research. 


242 





TWO-SAMPLE TESTS 


Then 
n+1 
Q,(r) = qe Vr) 
i=l 
and it follows that 
' 
P{Vdr) = 1) W. = w) = ——— wil - we)" 
ri(m — r)! 


(2.1) P{Vir) = 1, Vir) = 1| W; = wi, W; = w;} 


’ 


m u a~ir 


= - w; wj(1 — w; — w,) fort # j. 
r!2(m — 2r)! 


Let u(w,---, Weis 7) = E{Q,(r)| Wi = wm, +++, Waar = Wass} and let 
u(W,,---, Waa; 7) be the corresponding random variable. From (2.1) we 
obtain 


: 1 ; y 
(2.2) W,,°°*, Weu;r) = —m_ wad — We)”. 
u(W : 3) ota ) 
For each t 2 0, let R,(t) be the proportion among W,, --- , Was: which do 
not exceed t/n, and let 


1 
R(t) = 1 -| e@® o(y) dy. 
0 


It was shown by one of the authors [1] that 


(2.3) P{ lim sup |R,(t) — R(®| = 


no t20 


Setting m = an we rewrite (2.2) as 


. n+l1 
fi. Mia 8 we = (an)" |: ts a > wii —- Ww)?” 


j=0 an + 1s 


a” —1 [1 - 2 n i ( t 
a ine ( 
= “TI £. [ t= dR,(t). 


Using (2.3), (2.4), and a straightforward analytic argument we find that 


(2.4) 


lim u(W,, 


n-—-2 


>, Wausr) = te dR(t) 


1 
[ i” e tlatow) g°(y) dy dt 
“0 


T(r + 1) _ oY) wy 
la+ gy)" ° 


1 
g'(y) <P pte 
-«f fa + gly)|"* + g(y) \* dy . Q(r), 


with probability one. 





244 J. R. BLUM AND LIONEL WEISS 


Next we use Chebyshev’s inequality and (2.1) to obtain 


P{|Q,(r) — w(Wi,--- , WW ee €} 
1 n+l *\ 
< j ee ee! ae 
Sawa AY r) ow (l 1 
{ 23 m! aN . " 
y (r) — ————,, Wi(1 — Wi)" | - 
» E<| Vir) i wi F (1 — Wi) 


+ SGT 1)? 33 | 


[ v.09 — ———. Wil — wir} 
ri(m — r)! J 
for every « > 0. 


On examining the right-hand side of (2.6) we find that the first sum is 
O[1/(m + 1)] since each term is nonnegative and bounded by one. 

The same holds true for the second sum. This can be seen by rewriting each 
term in the form 


( 
m! 
r!?(m — 2r)! 


Wi wii — ¥, — wa" 


2 ) 
ingieen We Wak + wet = Was 
ri(m — r)!? ) 
and maximizing the sum subject to the conditions 0 < W,; < 1, 0% W; = 1. 
Now we use (2.5), (2.6), and the Borel-Cantelli lemma to obtain 
(2.7) P{ lim Q,:(r) = Q(r)} = 
no 
For each positive integer n, let K(n) be the positive integer satisfying 
[{k(n) — 1f < n S k(n). From the definition of Q,(r) it follows immediately 
that |[k?(n) + 1]Qz2m(r) — (n + 1)Q,(r)| S alk’(n) — n|. From this and the 
fact that lim,-... [k’(n)/n] = 1 we see that (2.7) implies 
P{ lim Q,(r) = Q(r)} = 
To complete the proof of the theorem we merely note that the uniformity of 
the convergence is an immediate consequence of the fact that for each integer 
n we have 


x “ 
DX Qa(r) = 1 = DQ). 
r=0 r=0 
3. Applications. Most tests proposed for testing the hypothesis that two 
samples come from the same continuous distribution are based on the ranks of 
one set of observations in the combined ordered sample. In our terminology 
these are the tests based on the statistics S;,--- , S,s;. In this section we give 
several examples of such tests which can be shown to be consistent against wide 
classes of alternatives by applying the theorem of the previous section. Through- 





TWO-SAMPLE TESTS 245 


out this section we make the usual assumption, valid for rank tests, that the 
X’s are uniformly distributed on the unit interval and that the range of the Y’s 
is also the unit interval. 

As a first example we consider the run test proposed by Wald and Wolfowitz 
[2]. Let U be the number of runs of X’s and Y’s in the combined ordered sample. 
The hypothesis is rejected when U/m is too small. From the definition of Q,(r) 
it follows easily that ' 


U n 


+ 1 er 
-2-— [1 — Q.(0)] | $ 5 


\m 
Thus if g(y) is any density satisfying the assumptions of Section 1 we find that 
U/m converges with probability one to 


27,_ fp’ ¢&) |- pe gy) 
21 lL ea* -2/ a+ oy 


and the test is consistent against all such densities for which 


eas [ gv) 4 
ati” hatgy) 
From a simple variational argument it follows that this holds for all such densi- 
ties which are positive almost everywhere and differ from one on a set of posi- 
tive measure. This result was obtained in [2]. 
Let k be a positive integer and let U, be the number of intervals [Z,_, , Z;] 


containing at most k X’s. Consider the class of alternative densities g(y) for 
which 


[ a. sand 
Jo [a + g(y)}** 


By an argument similar to the one given in the last paragraph it follows that the 
test which rejects when U, is too large is consistent against alternative densities 
in this class. 

As a third example we consider the test which rejects when V° = 
(I/(n + 1)) ORY S? = DOR0r*Q,(r) is too large. We note that V? = 
[(m?/(n + 1)]{C° + [1/(n + 1)]} where C’ is the statistic first proposed by Dixon 
[3]. Dixon computed the mean and variance of C’ under the assumption that 
the hypothesis is true. Using these results we find that 


1 
dy < @a pe: 


m(n + 2m) 


I ia cee eee ree ea 
mr (n + 1)(n + 2) 


dem ie — e+ ot ete SS 
r (n + 1)*(n + 2)2(n + 3)(n + 4) 





246 J. R. BLUM AND LIONEL WEISS 


when the hypothesis is true. It follows that under this assumption ) converges 
stochastically to 2a° + a. Now if g(y) is any density satisfying the hypothesis 
of our convergence theorem it is easily verified that 


nw r=Q 


Pa 1 q 
lim inf >> r°Q,(r) = 2a" | dy +a 
0 gly) 


with probability one. Thus the test is consistent for any such density for which 
1 J \ 
L < fo dy / g(y). 


4. A further property of a test. In this section we discuss briefly another 
aspect of the last test considered in the previous section. Let {g-(y)} be the 
class of alternatives defined by g.(y) = 1 + c(y — 43) where 0 < |c| S 2. Let 
(ti, °-: , tx41) be nm + 1 integers with O S 4 S --- S tn: and Dict t; =m 


i=l ? 


and let U be the set of (n + 1)-tuplets (s;, --- , 8.41) of nonnegative integers 
which, when reordered according to size, yield the numbers t, , --- , tas: . Fur- 
ther let P.(U) be the probability of the set U computed under the assumption 
that g-(y) is the density of the Y’s. P.(U) can be written down in the form of an 
integral over the n-dimensional unit cube. After appropriate integration it turns 
out that 


dP(U) &P(U) | << . 
° = miatenajeitaniiliatan = = I 
Tt 0 and We ; a > t + 5b, 


c= i=l 


where a and b are positive numbers depending only on m and n. As a conse- 
quence we find that if we restrict ourselves to tests which are symmetric in the 


variables S;, --- , S,41 then the test which rejects the hypothesis when V* is 
too large maximizes the slope of the power function at c = 0. In the Neyman- 
Pearson terminology, the test is of type A among the class of symmetric tests 
of the hypothesis. 


REFERENCES 


{1] L. Werss, ‘‘The stochastic convergence of a function of sample successive differences,”’ 
Ann. Math. Stat., Vol. 26 (1955), pp. 532-536. 

[2] A. WaLp AND J. Wo.rowi7Tz, ‘‘On a test whether two samples are from the same popu- 
lation,’’ Ann. Math. Stat., Vol. 11 (1940), pp. 147-162. 

(3] W. J. Drxon, ‘‘A criterion for testing the hypothesis that two samples are from the 
same population,” Ann. Math. Stat., Vol. 11 (1940), pp. 199-204 





A NOTE ON TRUNCATION AND SUFFICIENT STATISTICS’ 


By Watrer L. Samira 
University of North Carolina 


1. Introduction and summary. Generalizing earlier observations by Fisher 
and Hotelling, Tukey [1] showed that if a family of distributions admits a set 
of sufficient statistics, then the family obtained by truncation to a fixed set, 
or by a fixed selection, also admits the same set of sufficient statistics (this word- 
ing is Tukey’s; we give a precise mathematical statement later). Tukey’s proof 
assumed the relevant family of probability measures to be dominated by a fixed 
measure function and made use of the factorization theorem concerning sufficient 
statistics in this case. In the present short note we shall first re-prove Tukey’s 
result without assuming domination (and, hence, without appealing to the 
factorization theorem). Then we shall show that, under general conditions, if a 
sufficient statisticshas one or more of the properties of completeness, bounded 
completeness, or minimality, before truncation, then it preserves such after 
truncation. 

The treatment is on the lines of the abstract discussion of sufficient statistics 
given by Halmos and Savage [2]. We shall assume familiarity with the results 
given in this latter paper. For definitions of completeness, bounded complete- 
ness, and minimality, and for a discussion of the significance of these concepts 
we refer to Lehmann and Scheffé [3]. 


2. On ¢-truncation. Let X be an abstract space of elements z, and let $, be 
a (Borel) field of subsets of X. We write {ye ; @¢ 2} for a family of probability 
measures on (X, $,), where Q is an abstract parameter space. The statistic ¢ is a 
mapping of X onto another abstract space 7’, that is to say, we suppose for sim- 
plicity, with no loss of generality, that T is precisely the range {t(x):2 ¢ X} of 
the mapping ¢. If B C 7, we write f'B = {x:t(x) e B} for the origin of B. The 
class of all B C T such that {"B eS, is written §; ; it is easy to show that 5; is 
a (Borel) field. 

We shall write & for expectation based on ye . If f(x) is any ($,)-measurable 
function such that &|\f(x)| < ©, for all 6 in some set A C Q, we shall say f(z) is 
A-integrable. If f(x) is Q-integrable, the conditional expectation &@( f(x) | t) is given 
by the Radon-Nikodym derivative 

_ dy en 


(1) &(f(x) | t) = duet? 


where the measure ve is defined by 


(2) dyvg = f(x) due " 


Received December 19, 1955. 


1 This research was supported in part by the United States Air Force through the Office 
of Scientific Research of the Air Research and Development Command. 


247 





248 WALTER L. SMITH 


For a proof of this assertion, see [2]. Note that the derivative in (1) is arbitrary 
on a set of (uot ')-measure zero. If it transpires that for each Q-integrable f(x) 
the conditional expectation &(f(x) |) may be taken as independent of @ ¢ Q, 
then ¢ is sufficient for {us ; 0 €Q}. 

Suppose that ¢(z) is a non-negative Q-integrable function; define Q, = 
{0:8@(x) > O}; and define a new family {u? ; 6 ¢ 2,4} of probability measures 
on (X, F,) by the equation 


¢ _ (2) 
8) Me &6(2) 


We call {us ; 6 ¢Q,4} the ¢-truncation of {us ; 6 ¢@}. When ¢(z) is the charac- 
teristic function of some set A ¢,, then ¢-truncation corresponds to trunca- 
tion to a fixed set, in the usual sense. When ¢(z) is bounded above it is easy to 
see that it may be assumed to be bounded above by unity, and then ¢-truncation 
corresponds to what Tukey [1] has called fixed selection. However, ¢(x) may 
not be bounded above. 

We sometimes write, for brevity, {us} for {us ; @¢ 2); {us} for {us ; Oe D4}; 
and {ye}. for {us ; 8 € Qs}. 

We shall also write &} for expectations based on yf ; and if f(x) is any (5,)- 
measurable function such that &?#|f(x)| < ©, for all @ in some set A C Q, we 
shall say f(x) is A®-integrable. If a statement is followed by an expression like 
[9%], where SW represents a family of probability measures, this will mean that 
the statement is false, at most, on a set of probability zero for each measure of 9M. 
In this connection let us notice that, since {ue}, dominates {us}, we can always 
replace [{uo} 4] by [{u8}], [{uet “}o] by [{uet*}]. 

Lemna. If f(x) is Q}-integrable, then o(x)f(x) is Q,-integrable. 

Proor. If @ ¢ Qs , and we define X, = {x:¢(x) > 0}, 


dye . 


& | (x)f(2) | [ $(z) f(z) | die 


(e.o(e)} [ |02)| dud 
{Sep(x)} {88 | f(x)|} 


oe. 


TuEorEM. (i) [If t is sufficient for {us ; 0 ¢ Q}, then t is sufficient for {us ; 0 € Q4}; 
(ii) if, in addition, t is complete, minimal, for {us ; 0€24}, then t is complete, 
minimal, for {us ; @¢24}; and a similar remark applies if t is boundedly com- 
plete—provided, in this case, that (x) is bounded above. 

(Notice that we require ¢t to be complete, etc., for the subfamily {ps ; @ ¢ Q4}. 
It is possible that ¢ be complete, etc., for {ue ; 8 ¢Q} and not so for {us ; 6 € Q4}.) 

Proor. We observe first that by (1), (2), and (3), for @ €Q, , 


(4) dust’ _ &(o(z) | t) 


dg t G(X) 
where we may omit the suffix @ in &%(¢(z) | t) because ¢ is sufficient for {us ; 8 ¢€ Q}. 





TRUNCATION AND SUFFICIENT STATISTICS 249 


Let f(z) be any 2§-integrable function. Then by the lemma, ¢(zx)f(x) is Q,- 
integrable, and by the sufficiency of ¢ for {ye; 6€2}, and hence for 
iwe ; OE QD, C QI, we may write &{f(x)d(x) | t} independent of 6 ¢ 2, . Thus for 
any A « S, we have for all 6 € Q,, 


[ es@o@) |) du = [_ Fe)0@) due 
A t~la 
t&eo(e)) | f(2) duf, by (23), 
{Ee d(x) } / 63 (f(x) | t) dust”, 
A 


- | E$(f(x) | DE(O(z) | t) dus, 
A 


by (4). Since the last equation holds for every A ¢%,, we deduce from the 
{adon-Nikodym theorem that 


(5) 68 (f(x) | NE@() | ) = E(f(@)(z) |), [mel 6]. 


The function @¢(x) is non-negative on X, from which it follows that 
&(o(x) | t) = O, [fet *} 4]. Write T, = {t:8(¢(x) | t) > 0}; then forte T — T,, 
we have &(¢(x) | t) = 0, [{uet?}4]. Thus, if @¢Q,, 


| dpit* = [ dub” du = 
TT, TT, dys t ary 


§ [ &(¢(z) | t) 


&09 (x) 


= 0. 


We have therefore shown that &(@(z) | t) > 0, [fuse 3], and may deduce from 
(5) that 


E(s(@)o(2) | 0 


85(f@) 10 = ~ aD 


[{ust’}). 

Hence for any 2$-integrable function f(x) there exists a version of &3 (f(z) | t) 
which is independent of @ ¢Q,. This is enough to prove that ¢ is sufficient for 
(us ; 0 € Q,}. 

Next suppose that ¢ is a complete sufficient statistic for {ys ; @ ¢Q2,}. To prove 
that ¢ is complete for {us ; @¢2,}, we must show that if ¥(i) is an arbitrary 
F,-measurable function such that &y(t(x)) = % for all @¢Q,, then y(t) = 0, 
({ust*}]. However, if 


[| vo ate = 0 all 6 2%, 
T 





250 WALTER L. SMITH 


it follows from (4) that 


I W(t)E(G(x) | t) dust* = 0, all 0 ¢ Q, . 
T 


But ¢ is complete for {ys ; 6 ¢ Qs}, and ¥(t)&(@(z) | t) is an (F,)-measurable func- 
tion of t. Hence 


V(E)E@(x) | t) = 0, — [fet “hl. 
Since we have already seen that &(¢(x) | t) > 0, [{uet’}.], the proof that ¢ is 
complete for {us ; 6 ¢ 24} is thus evident. 

When t is boundedly complete we can employ precisely the same argument, as- 
suming both y(t) and ¢(x) to be bounded so as to ensure, as is easily checked, 
that ¥(t)&(o(2x) | t) is bounded [| pot “2. 

Lastly we deal with the minimality question. Suppose that s(x) is any statistic 
defined on X which is sufficient for {uf ; @ ¢ Q,}. Write S = {s(x):2 ¢ X} for 
the abstract space on which s maps X. Then to prove that ¢ is minimal we must 
demonstrate the existence of a mapping h of S on T such that t(r) = A(s(x)), 
[{uet*}). 

Recall X, = {x:¢(x) > O}, and notice that since @ ¢ Q, implies & (xr) > 0, 
it also implies that ys(X,) > 0. Plainly, us(X — X,) = 0 for all 6 ¢Q, ; thus 
it will be enough to prove the relation f(x) = h(s(x)), {{ust}], merely on X,. 
To this end, let us define a new statistic 

8 (x) = s(z) if reX,, 
= TF if ZE x _ Xe . 
Since s(x) is obviously a one-valued function of s;(x), it follows that s;(z) is also 
a sufficient statistic for {ue ; @¢24} (by Theorem 6.4 of Bahadur [4]). We 
write S; = {s:(x); x e X} and &,, for the (Borel) field of all subsets B C S, such 
that s;'B ¢ 5... 
If we set 
1 . 
((z) = —., ze X,, 
$(x) 1 
= 0, wea — Kg. 


then (x) is a non-negative (¥,)-measurable function on X; and for @¢Q,, 


6 1 é 
&9¥(x) - I. $(z) dus , 


= [ J : 9a) | dus , by (3), 
x, O(X) Se d(x) 
_ Me (Xo) 
& O(2) 
It follows from (6) that ¥(x) is 2$-integrable, and so we may consider the y-trun- 
cation of {us ; @ ¢€ 24}. We shall employ obvious extensions of our notation, and 


observe that by (6), 93, = {0;4¢2,, &¥(x) > O} = Q,. Hence, forall @<¢Q,, 





TRUNCATION AND SUFFICIENT STATISTICS 251 
dus’ = 0, ifae X — X,, 


Wiz) 1. 
efy(a) 


we oe He» by (3) and (6), 


d ; 
(8) —-, if ze X,. 
po( X¢) 

Because s; is sufficient for {us ; @¢2,}, and because Qf, = 24, it follows 
from the first part of our theorem that s, is also sufficient for the ¥-truncation 
tug’ ; @€ 24}. Let f(x) be any Q,-integrable function. Then we notice that for 
6¢Q,, by (7) and (8), 

68" | f(x)| = 


1 ' 
See, [ | f(x)| dye < oe, 
poe( X ¢) Xs 
i.e., f(z) is 2$*-integrable; and by the sufficiency of s, there must exist a function 
g(s:1) = &**(f(x) | s:) which is independent of @¢2,, is such that g(s,(x)) is 
(¥,)-measurable, and is such that for any B ¢&,, and all @¢Q,, 


/ f(x) duf* = / g(s,(x)) dus”. 
s; 'B #7 'B 


e i 


This implies, by (7), that 


Iegne7 


[eo aut? = [_ olsile)) ut’, 
Xgnez'B i 


and so, by (8), that 
(9) [ He) due = [ol sila)) dus. 
XgnsyB Xgnsy 1B 


Finally, define an (¥,)-measurable function 
g*(s;(x)) = g(s;(x)) fae S,1Le., i ze X, 


= g(si(x)) (= f(x)) ifs,eX — X,,1e.,ifzeX — X,. 


Thus 


pf dus + I . f(x) dug 


f x) dug = 
/ Xgne7 —X,g)nsz 1B 


one) 


= [ _, ofelz)) duet f g*(si(2)) dus 
J Xgnsz7'B (X—Xg)ney'B 


ons) 


o 


= | o%sil2)) dus, 


ve; 





252 EMANUEL PARZEN 


the last equality following from (9). Since f(x) is an arbitrary Q,-integrable func- 
tion, this last equation, being true for all B ¢ §,, , shows that s; is a sufficient sta- 
tistic for {us ; 8 € Qg}. But we are given that tis minimal sufficient for {u, ; 6 ¢ 24}. 
Hence there is a mapping h of S, onto 7 such that ¢t(r) = h(s;(x 


f 
» Li mel 
If we now restrict x to X, it is evident that t(x) = h(s(x)), [{ugt"}], as was to 
be proved. 


REFERENCES 

{1] J. W. Tuxey, ‘Sufficiency, truncation, and selection,’ Ann. Math. Stat., Vol. 20 (1949), 
pp. 309-311. 

[2] P. R. Hatmos anp L. J. Savaae, ‘‘Application of the Radon-Nikodym theorem to the 
theory of sufficient statistics,’’ Ann. Math. Stat., Vol. 20 (1949), pp. 225-241. 

[3] E. L. LeaMaNnN AND H. Scuerré, ‘‘Completeness, similar regions and unbiased estima- 
tion, Part I,”’ Sankhya, Vol. 10 (1950), pp. 305-340. 

{4] R. R. Bawapur, “Sufficiency and statistical decision functions, ’’ Ann. Math. Stat 
Vol. 25 (1954), pp. 423-462. 


A CENTRAL LIMIT THEOREM FOR MULTILINEAR STOCHASTIC 
PROCESSES 


By EmMANvueEL Parzen 
Stanford University 


1. Introduction. Let the random sequence X(t) be observed for? = 1, 2,--- , 
and let S(n) = X(1) + --- + X(n) be its consecutive sums. The random se- 
quence may be said to obey the classical central limit theorem if, for any real 
number a, 


. Ss — ES(n) 
(1.1) lim Prob < w) — ES(n) 
n—+2 . o[S(n)] 


Because of the importance of the central limit theorem in establishing the 
properties of statistical tests and estimates, it would appear that in order to 
develop a satisfactory theory of statistical inference for stochastic processes 


which are random sequences of dependent random variables, it is necessary to 
establish a central limit theorem for such processes. Diananda [2] has proved 
a central limit theorem for discrete parameter stochastic processes which are 
linear processes. We here introduce a class of stochastic processes which we call 
multilinear processes, for which we prove a central limit theorem. The results 
are capable of extension to the continuous parameter case, but we do not do so 
here. 


Received December 19, 1955; revised August 16, 1956. 





CENTRAL LIMIT THEOREM 253 


2. Multilinear processes. A random sequence X(t), defined fort = 0, +1,--- 
is said to be a multilinear process if it can be represented as follows: 


x 


(2.1) > alv, a2 oe , Ux) Wilt —_ v) "2 W lt _— Vx), 

ey aa 
where K is a positive integer; the a(v; , --- , vx) are constants, defined for v; = 0, 
+1,--- andi = 1,--- , K, such that 


~* 


> |a(vu.,---,un)| < @, 
a. 
and the W(t) are random variables, defined fort = 0, +1, --- andi = 1,---, 
K, such that the K-dimensional random vectors W(t) = [Wi(t),---, Wer(t)] 
are independent. For the sake of clarity, we give the definition of independence; 
the W(t) are independent if, for any integer n, for any set of points 4, ,--- , tn, 


and any set of bounded Borel functions of K-variables g,(w), --- , gn(w), it 
holds that 


(2.3) Elg,(W(t)) --- gn(W(tn))|] = Egiu(W(h)) --- Egn(Wt,)). 


It is also assumed that the random variables W,(t) satisfy the condition that, 
for some a > 2 and constant C, 


(2.4) E | Wilt, eos W (tx) " < ec for any t; a 7 o© tx ° 


Random sequences which admit a representation of the form of (2.1), with 
K = 1, have been called by Bartlett ({1], p. 146) linear processes. The interest 
of multilinear processes derives from the fact that they possess certain closure 
properties, if it is assumed that the random variables W,(¢) which occur in the 
definition of the multilinear processes possess moments of sufficiently high order. 

By a linear filter is meant a linear transformation on the space of bounded 
doubly infinite sequences { X(t), t = 0, +1, ---} to itself, of the form 


(2.5) Y(t) = > k(v) X(t — v), 


v=o 


where 


~ 


(2.6) Dd |kw)| < @. 


veo 


Multilinear processes are closed under the operation of linear filtering, in the 
sense that if X(t) is a multilinear process, then so is Y(t). 

It may be verified that powers of multilinear processes are multilinear proc- 
esses. More generally, polynomials in multilinear processes 


P[X(t)] = cnX"(t) + --- + X(t) + & 


are multilinear processes. 





254 EMANUEL PARZEN 


Next, we note that the sum and product of the multilinear process X(t), and 
a different multilinear process 


X(t) = Doe’, «++ , vx) Wilt — 1) «+: Walt — ve), 


is again a multilinear process if the (K + K’) dimensional random vectors |W,(t), 
-++ , Wx(t), Wilt), --- , We-(t)] form an independent sequence. Since the process 
defined by X’(t) = X(t + 1), where 7 is fixed, satisfies this condition, it follows 
that X(t)X(t + 7) is a multilinear process. 

The model of a multilinear process appears to be applicable to many stochastic 
processes which are physically observed. For many physical processes may be 
regarded as arising from independent random variables through a finite bank 
of linear filters and non-linear power law interactions which represent on the 
one hand the effect of measuring devices, and on the other hand the transmission 
properties of nature from the regions where the independent events occurred to 
where they (or their superpositions) are observed. 


3. A central limit theorem. 
TueoreM. Let X(t) be a multilinear process, in the sense that it admits of a repre- 


sentation of the form of (2.1), with (2.2), (2.3) and (2.4) all being true. Then (1.1) 
holds if 


(3.1) lim inf ~ a |S(n)] > 0. 


Proor: For any positive integer M, let V » be the set of K-tuples (v; , --- 
whose components v; are integers and satisfy |»; S M. Define, fort = 0, +1, 


» UK) 


(3.2) X y(t) = 7 a(v; _ *AP Ux) Wilt — v;) Poe Welt —" Ux). 


(ey,°** oR eV a 
Define the consecutive sums 
S y(n) = X w(1) + eee _ X y(n) 
and let Ry(n) = S(n) — Sx(n). To prove the theorem we use the method of 
iterated probability limits introduced by Marsaglia [3]. We will show that 
(i) for every M greater than some Mo, (1.1) is satisfied by the X y(t); that 
is, the random sequence (Sy(n) — ESy(n))/o[Su(n)] is asymptotically normal 


with mean 0 and variance 1. 


o[Ru(n)] 8 


Gs on lim sup 2m 
(ii) lim lim sup aAS@) 


(iii) lim lim sup |1 — olSu(n)] = 
M 


" a[S(n)] | 
In view of Theorems 1 and 2 of Marsaglia [3] the validity of these facts imply 
the validity of the theorem. To prove (ii) and (iii), we use the inequalities, for 
any random variables X and Y, 





CENTRAL LIMIT THEOREM 


o[X + Y] S o[X] + oY], 
\o[X] — of Y]| S ofX — Y}. 
From (3.5) it follows that (ii) implies (iii). To establish (ii), we write 
R,(n) = PF a(v, , +++ , Ur) > W(t — vg) --- Walt — vg). 
(oy. eee t=l 
Consequently, by (3.4), 


I 
fr o[R u(n)] <s > a(n, » we, Ux) 
Vn ( 


(or,°** eR )eV 


1 n 
x “| Zr > Wilt -— §) °° W elt _ ve) | 


Tl tml 


(3.6) 


Now 


o E- = Wilt ~~ Vv) Ae Wrlt ™ ve) | 


Vn t=1 


“ . Sie — 0) OL — oe 
t=1 


42 > > cov(W\(s — v,) --- Wels — vg), Wilt — vo) --- Welt — vg)}. 
TL gm] tos+1 
For fixed s,  , --- , Ux , the covariance in (3.7) vanishes for all ¢ except perhaps 
for t such that t = s + v, — v; for some i, 7 = 1, --- , K; there are at most K* 
such values of ¢. 
Now, in view of (2.4), there is a number C; such that 


(3.8) o (W(t) “a9 W x(tx)] = Ci for all by ei? « tr ° 


Consequently, the variance on the left-hand side of (3.7) is less than Ci{1 + 2K°}, 
which is less than 4C7K*. Further, from (3.1), it follows that there is a positive 
constant B’ such that for all n, 


(3.9) oIS@)] . pe 
n 
Therefore 


(3.10) WRatn)) < 1 at CK jal, ---,0 
o[S(n)} ~B Vn B ey,---on eV 
From (3.10) and (2.2) one may infer (ii). 

Next, to show (i), we note that the X y(t) form an 2M-dependent sequence of 
random variables. From Marsaglia [3], it follows that a sufficient condition for 
the X y(t) to obey the central limit theorem, and thus for (i) to be established, 
is that for some positive constant B, , 


(3.11) 7 1Su(n)] > Bi 
n 





256 R. L. WINE AND JOHN E. FREUND 


and for some a > 2 and constant C, 
(3.12) E | X(t) — EXx(t) \* S C2 for all ¢. 


For M large enough, (3.11) follows from (3.1), (3.10) and (3.5). By Minkowski’s 
inequality, (3.12) follows from (3.2) and (2.4). The proof of the theorem is now 
completed. 


4. A remark on applications. One use of the foregoing central limit theorem 
is to provide conditions, without any further ado, for the asymptotic normality 
of various estimates of the spectrum of a stationary time series that have been 
considered by us (see [4]). 


REFERENCES 
. S. Bartietr, An Introduction to Stochastic Processes, Cambridge, 1955. 
. H. Drananpa, ‘‘Some probability limit theorems with statistical applications,”’ 
Proc. Cambridge Philos. Soc., Vol. 49, (1953) pp. 239-246. 
. Marsaciia, “Iterated limits and the central limit theorem for dependent random 
variables,’’ Proc. Amer. Math. Soc., Vol. 5 (1954), pp. 987-991. 
5. Parzen, ‘On consistent estimates of the spectrum of a stationary time series,”’ 
to be published. 


es 


ON THE ENUMERATION OF DECISION PATTERNS INVOLVING 
n MEANS! 


By R. L. Winer? anp Joun E. FREuUND 


Virginia Polytechnic Institute 


1. Introduction. The purpose of this paper is to provide a mathematica] 
treatment for the enumeration of decision patterns obtained in the pairwise 
comparison of n sample means. In the comparison of n means, there are altogether 


(7) pairwise comparisons, and each individual comparison between two means, 


say m, and m2, must result in the decision that m, is significantly less than m , 
that mz, is significantly less than m,, or that there is no significant difference. 
Symbolically, these three alternatives are written as m, < m2, m2. < m,, and 
m, = Me, respectively. 

There are, thus, altogether 3%) possible decision sets in the comparison of 


_ 


2 — n er ' 
n objects, a decision set consisting of the (5) pairwise comparisons. However, 


for the comparison of n means, there are fewer decision sets since circularities are 
automatically ruled out. 


Received May 14, 1956; revised July 6, 1956. 

1 Research sponsored by the Office of Ordnance Research Contract DA-36-034-ORD-1477 
U.S. Army. 

2 This paper is a section of R. L. Wine’s Ph.D. dissertation. 


’ 





DECISION PATTERNS 257 


A decision set involving n means can be represented symbolically with the 
use of the following scheme: 

If m, < m2, the letter m; is written to the left of m, ; if m, = mz, the letters 
m, and mz are underlined with what we shall call an indifference line and it does 
not matter whether we write ™,m, or mym, . We shall also write mm m, to ex- 
press the fact that m,; = m2, m2. = m;, and m, = m;. In general, the fact that 
m; = m; will be expressed by an indifference line common to m; and m;. 

The following are two simple examples illustrating this representation of de- 
cision sets: 

(i) The decision set m, < mz, m, < mg, m, < my, Mp = Mz, M2. < m, and 
m; = m,is written as 


my, Me _ mM, ™, 
and 


(ii) the decision set m,; = mz, m, = m3, ™m, < m4, Mz = Ms, M, = m,, and 
ms; = m,is written as 


mm mM. ™M, MM. 


If the only difference between the schematic presentation of two decision sets 
is a permutation of the means m, , mz, m;,--- , m,, they are said to belong to 
the same decision pattern. A decision pattern is, therefore, characterized by the 
number of means and the number and the arrangement of the indifference lines. 
The decision pattern corresponding to a given decision set will be indicated by 


replacing the mean with dots. The decision pattern corresponding to the first 
example above is 


and that of the second example is 


An important point which must be observed in the construction of decision 
patterns is that no indifference line is completely covered by another indifference 
line. 

Having defined decision patterns and decision sets, one may now ask 

(a) What is the total number of distinct decision sets in the pairwise com- 
parison of m means? 

(b) What is the total number of distinct decision patterns in the pairwise 
comparison of n means? 

In this paper it will be shown that the number of decision patterns involving 
n sample means is 


1 2n 
(1.1 f(n) = 4 ( , 
(11) fn) = —— 
Although question (a) can, of course, be answered by direct enumeration for 
small values of n, the general problem is as yet unsolved. 





258 R. L. WINE AND JOHN E. FREUND 


2. Derivation of formula for number of decision patterns. In order to derive 
formulas giving the total number of decision patterns, consider the last k dots 
on the right in a pattern which has n dots, n 2 k. Beneath each pair of dots, a, 
and a;,,; , Where i = 1, 2,--- ,k — 1,7 line segments, 7 = 0, 1, 2, --- , may be 
drawn, each line being part of an indifference line, or a whole one. The step from 
a; to ai4:, called the ‘‘7-th step,’’ may be made in many ways, as indicated by 
the number of line segments underlining the pair of dots. Let s; denote a step 
with 7 line segments. It should be noted that each line segment under dots a; and 
aj, may or may not be part of an indifference line including several other dots. 
A dot a; is called a ‘right terminal dot” (“left terminal dot’’) of an indifference 
line whenever the indifference line does not extend to a;4;(a;_,). 

Let f;(k), 7 = 0, 1, 2, --- , denote the total number of decision patterns pos- 
sible when the first step of / dots is s;. 

It can be seen easily that 


(2.1) fo(k) == fo(k — 1) + filk “~ 1), k és 


since the number of decision patterns for k dots with first step so is the same as 
the sum of the number of decision patterns for k — 1 dots with the first steps 
8 and s; (kK cannot be S2, since f;(k — 1) would be undefined). 

A general recursion formula for f,(k) with e = 1, 2, 3, --- may be written as 


(2.2) fk) = ferlk — 1) + 2f-(k — 1) + ferslk — 0), k 2 3. 


To prove (2.2), assume that s, is the first step, in which case it is necessary that 
n = k +e — 1. It must be large enough so that no two of the e indifference lines 
of the first step have a; or any dot to the left of a; as a common left terminal dot. 
At least e — 1 of the indifference lines in step one must be continued beyond az , 
since two indifference lines can not have a common right terminal dot at a. 
The second step, thus, has at least e — 1 indifference lines. On the other hand, 
not more than e + 1 indifference lines are possible in the second step, since two 
indifference lines would otherwise have az as a common left terminal dot. Thus, 
if the first step is s, and only one of its indifference lines terminates at a2, the 
second step is s,_; or s, ; if the first step is such that no indifference lines terminate 
at a2, the second step is 8, Or 8.4; . 

For certain values of k, s, is an impossible first step, and f,(k) is equal to zero. 
(It will be assumed here that n is sufficiently large so that not more than one in- 
difference line has a left terminal dot at a, .) If k is an arbitrary positive integer, 
say r, then s,_; is a possible first step since each a; , where i = 2, 3, --- , r, may 
be a right terminal dot for exactly one indifference line. If the first step were s, , 
then some point a; would have to be a common right terminal dot for two in- 
difference lines. Thus f.(r) = 0, when e = r,r + 1, --- , and, in general, 


(2.3) fk) = 0, 
where 


2& > 1. 





W, STATISTIC 259 


Let f(n) denote the total number of decision patterns for n means. Clearly, 


(2.4) f(n) = fo(n) + filn), 
since s and 8; are the only possible first steps. 

Since f(n) depends only on fo(n) and f,(n), equations (2.1), (2.2), and (2.3), 
together with the boundary conditions 


(2.5) fo(1) = fo(2) = fi(2) = l, 


will lead to (1.1 
Using standard techniques for solving difference equations, it can be shown 
that* 
‘ 2e 1 2k — 2 
(26) f.(k) = FY ). 
e+k\k+e-1 


This result can be verified by substituting (2.6) into equations (2.1), (2.2), 
(2.3), and (2.5). It follows immediately that 


1 2n 
= fo i( oe 7 
f(n) = fo(n) + filn) n+1 ( n ) 


——————— ——— 


PERCENTILES OF THE w, STATISTIC’ 


By B. SHERMAN? 
University of California, Los Angeles 


If n points are selected independently from a uniform distribution on a unit 
interval there arise n + 1 subintervals, each of expected length 1/(m + 1). If 
L, is the length of the kth interval from the left, then 


n+1 | 
— Te en 
Wn ° 
’ 2. ’ n+1) 
The distribution function of w, is 0 for z < 0, 1 for x > n/(n + 1), and for 
0Os2 8 n/(n+ 1) 


7 n n—1 ' ' 
F(z) = b,2 b,-12 +---+b2+b,+ 1, 


» — Spee (at 1 (2 + Nery (" ~ “ 
= 2 ” ary gq k} \n+ 1 : 


where 


’ The authors are indebted to Dr. Leo Moser of the University of Alberta for suggestions 
leading to a simplification of part of this proof. 

Received March 29, 1956; revised October 22, 1956. 

1 This paper was prepared under the sponsorship of the Office of Naval Research and 
the Office of Ordnance Research, U.S. Army. Reproduction in whole or in part is permitted 
for any purpose of the United States Government. 

2 Present address, Westinghouse Research Laboratories, Pittsburgh 35, Pennsylvania. 





260 B. SHERMAN 


the upper limit of summation being determined by 


n-rTr-— n— 
Re ’ r 


— $s'<——-.. 
eh. eee 


This distribution function has been derived in [1]. Further discussion may 
be found in [2]. The statistical uses of w, are discussed in [1] and [3]. 

Using the SWAC, the high-speed automatic computer at Numerical Analysis 
Research, the 99, 95, and 90 per cent points of w, have been obtained for n 
ranging from 3 to 20. These results appear in Table 1 below, with values for 
n = 1 and n = 2 also included. Technical details of the calculation may be 
found in [4], but we give a brief summary here. In the range (n — 1)/(n +1) S 
xz < n/(n + 1) the distribution function F,(x) is given by a single polynomial, 
and we know that F,,(n/(n + 1)) = 1. We then calculate F,,((n — 1)/(n + 1)), 
and if this value is less than .99, we know the 99 per cent point of w, lies in the 
range (n — 1)/(n + 1) S x S n/(n + 1). We solve the polynomial equation 
F(x) = .99 to obtain it. If F,,((n — 1)/(n + 1)) is less than .95 or .90, we obtain 
the corresponding per cent point by solving the equation F,(z) = .95 or .90. 
If F,((n — 1)/(n + 1)) is greater than .99, we determine F,(x) in the range 
(n — 2)/(n+ 1) S x < (nm — 1)/(n + 1) and calculate F,((n — 2)/(n + 1)), 
thus determining which of the per cent points lies in this range. Proceeding in 
this manner we obtain, for each n, the three per cent points. 


TABLE 1 


99, 95, 90 Percentiles of w, 


r 99 95 90 


.49500 .47500 .45000 
.60893 .53757 .48410 
61428 .51792 .46673 
.58870 50955 .46850 
57442 .50181 46195 
.56263 .49398 45847 
.55128 .48801 .45434 
54241 .48243 .45100 
.53435 47772 44786 
.52743 47346 44510 
52126 .46970 44257 
.51577 .46630 .44029 
51082 46323 .43820 
50634 46043 .43628 
50225 .45786 43452 
.49851 .45550 43288 
49506 45332 .43137 
49188 .45130 42995 
48892 44942 42863 
48617 44766 .42739 


~ 
coe mnoaonr WN 





— 
n= 


— th 
orf WwW 





@, STATISTIC 


TABLE 2 
99, 95, 90 Percentiles of w” and wo 


n wo! (99) w.? (99) | eo!” (95) w\® (95) 


-4723 1.9000 
-4417 2.0758 
4238 2.1415 
4206 2.1497 
4181 2.1574 
.4158 2.1644 
-4137 2.1707 
-4117 2.1764 
.3264 .3264 


. 7228 
.6969 


| — 


5 1.2321 
1.3736 
.6874 1.4338 
6361 1.4420 
6848 | 1.4494 
1 
1 
1 
1 


mw we 


15 
16 
17 


Ww Ww &W b 


NO 


6837 .4561 
.6827 . 4622 
.6817 .4678 
.6449 


19 
20 


oo 


Ww Ww Ww DW & &W & 
feet 


tw b& & 


The random variable w, has mean 


+1 
( n a | 
n = es _ — 
n+ 1 € 
and variance 


. n+2 \n+2 2n+2 - 
2 2n"** + n(n — 1)’ n mts Qe — 5 
D, ° 7 — a: a a Se 
(n + 2)(n + 1)” n+! 


ée n 
and it is proved in [1] that w.’ = 1/D,(w, — E,) tends to be normally distributed 
as n — o. It is therefore also true that 


tends to be normally distributed as » — ©. The rapidity with which this con- 
vergence occurs is indicated in Table 2 below, which gives the 99, 95, and 90 
percentiles of w) and w,” forn = 5,n = 10, nrangingfrom 15 to 20, and n = @, 
i.e., the corresponding percentiles of a normal variate with mean 0 and variance 
1. The percentiles of ws are reasonably close to the limiting values but those of 
wo are not. In either case the convergence is slow. It appears that n would have 
to be greater than 100 before the 99, 95, 90 percentiles of ow and w are within 
one per cent of the limiting normal values. We note that the percentiles of w\” 
and w are, respectively, decreasing and increasing to the limiting values. 


REFERENCES 

{1] B. SHerman, “A random variable related to the spacing of sample values,’’ Annals of 
Math. Siat., Vol. 21 (1950), pp. 339-361. 

[2] D. A. Darina, “On a class of problems related to the random division of an interval,”’ 
Annals of Math. Stat., Vol. 24 (1953), pp. 239-253. 

[3] D. J. BarrHotomew, ‘‘Note on the use of Sherman’s statistic as a test of randomness,”’ 
Biometrika, Vol. 41, pp. 556-558. 

[4] B. Suerman, ‘‘Determination of three percentiles of a distribution function,’’ to appear 
in the first issue of the Numerical Analysis Research publication. 





I, J. GOOD 


ON THE SERIAL TEST FOR RANDOM SEQUENCES 
By I. J. Goop 


Let a, @,--: , dy be a finite random sequence, &, of independent random 
variables with P(a; = r) = p,(j = 1, 2,---,N;r = 0,1,---,t— 1), where 
> py = 1. We call each of 0, 1, --- , t — 1 “digits.” Associated with @ is the 
corresponding cyclic sequence G, defined by regarding the first digit of @ as 
immediately following the last one. We shall always denote properties of © by 
placing a bar over the corresponding algebraic symbol relating to @. 

A sequence of » digits is called a »-sequence. A v-sequence is said to belong 
to @ if it is of the form a;, aj41, +++ , @j4,a(7 = 1, 2,---,N — » + 1) and 
to belong to G if j is also allowed to take the values V — vy + 2, N — » + 3, 

-N, where ay4, is identified with a, for all integers k. Let n,,,,,,....7,, 0% Ne for 
short, be the number of yv-sequences in © which are the v-sequence (rm, r 


-,7,) = v (where r; , 7, +--+ , T and 7, are digits). Let 


2% 


Py: “fir Byes (1) 


Y--to. 
(ii, — Np.) 


; Np: 


v=> In, — WW - > + Ip) 


where r runs through all its ¢’ possible values. Let 
We shall prove that if’ 


then 


and 
&(y,) = 0 — 1. (7) 
The special case po = pi: = --- = prs = €' was dealt with by Good [1], who 
also proved some asymptotic results that have been generalized to arbitrary 
random sequences by Billingsley [2]. When we apply these asymptotic results 
to actual (finite) values of N, our confidence in their approximate validity is 
increased on finding that equations (6) and (7) are exact. 
We mention in passing that the variances of y; and ¥; are asymptotically 
2 4 » : 
7 ON +t — (2y + 1)t + 2 — 1), (8) 


but we do not know how good the approximation is for actual values of NV. 


Received April 4, 1956. 
1 The condition (5) was also required by Good [1], but in error it was not mentioned. 





SERIAL TEST FOR RANDOM SEQUENCES 263 


The results can be generalized to random arrays (rectangles or ‘“‘blocks’’) in 
two or more dimensions, with circularization in any number of these dimen- 
sions. (The word “dimension” is bere being used in its obvious Euclidean sense 
and not in the sense of Good [1], p. 280.) The length, breadth, height, .. . of 
the whole array may be denoted by Ni, N2, N;,-+-- . A v-sequence or v-block 
is then defined as an array of digits forming a block of length, breadth, height, 

1, ¥%, ¥%, °°: , digits, and with the obvious definitions of ¥2 and ¥ it can 
be proved that if 


“4x 3(N; + 1), vw S (No + l),-, (9) 


then 


gy) = t"" (10) 


and 


&(y) = 0" ; (11) 


The proofs of these multidimensional results are essentially the same as for the 
one-dimensional case, but involve extra notation that tends to obscure the 
issue. We shall therefore content ourselves with giving the proof for the one- 
dimensional case only. 

Iflsjsjr+v-—1 Ss N (and continuing to regard a,, a@,--- , ay as 
random variables), let 


Pla; = 1, °°* , @jar-n = 7) = P(t), 
and if 1 S 7, | S 7 + m (where m may be negative), 7 + m + vy — 


let 


> = . on = 
P(G;+= 1, Bjimel = 125 °°? » Ojpmpri = T | Gj = 11, °** , Ajarn = 


= Pi(n,--:,ft) = P,(t), 


the probability that a »-sequence agrees with an assigned one m places to its 
left. Everything will depend on the following lemma proved at the end of the 
paper. 

Lemma. [fv +{m, SN, then 


\l ifm+0 
> _ } ’ 
X Fall) 16 ne @, 


where r runs through all of its t’ possible values. 
We shall now prove equation (6). 
By (12) and (14) of Good [1], we have 


&(n,.) = (N — » + 1)P(r), 


Vin.) = P(t) & (N—v +1—|m))(P,(t) — P(d). 





264 I. J. GOOD 


Therefore 
jmi<v 


a N — — a |) (P.. — P(r)), 
Woo gi & & Nv +1 — | m (Pale) — PO) 


l Imi<y 


Nove & Nv tl —|m) 2D Pale) — Po). 
By the lemma, the inner sum vanishes except when m = 0, and equation (6) 
follows immediately. Equation (7) may be proved in a similar but slightly 
simpler manner, and the proof will be omitted. We now prove the lemma. 

Proor. Negative values of m may be treated by the same method as positive 
values, and the case m = 0 is trivial since Po(r) = 1, so we shall suppose m 
to be positive. The lemma is also trivial if |m| 2 », since P,,(r) is then equal 
to P(r), but in the application |m| < v, so we suppose 1 S m < v. We may 
suppose 7 = 1 without loss of generality. 

By the multiplicative axiom of probability, we have 


&(y;) = 


P(t) = P(di4m = Ti, Qe4m = T2,°** , Gy = Tr-m| ny? oe ",) 
x P(a,41 ™ Tost-m 5 °° » Ge @ Tr 
GQ = 11, °°* » Oy = Tr 5 Qi4m = 11, °°* » Op = Tom). 


The first factor is either 0 or 1, depending on whether the following conditions 
are not or are satisfied: 


five = 1,T40 ™ ms *** » » T, * Tr—m + 


When these conditions are satisfied, the second factor is simply equal to 
P(Qy41 = Trsi-m,*** » Grim = Ty), Since © is a random sequence. Therefore 


vm 


> P,(r) = Pléag @ Torinas*** + Gae™ fT) 


t 


P(Q,41 7 Totem, °** » Gra ™ ry), 
Tye+l—mr* Ty 
since the conditions above the sign of summation in the previous line determine 
11, T2,°** y Trem Uniquely in terms of r,,:-m,-°-* , 7. Our sum is now merely 
the total probability of all ¢" possible m-sequences and is therefore unity. 


REFERENCES 
[1] I. J. Goon, “The serial test for sampling numbers and other tests for .andomness,” 
Proc. Cambridge Philos. Soc., Vol. 49 (1953), pp. 276-284. 


[2] P. Brtuinestey, ‘Asymptotic distributions of two goodness of fit criteria,’’ Ann. 
Math. Stat., Vol. 27, (1956) pp.1123-1129, 





ON THE SPECIFICATION ERROR IN REGRESSION ANALYSIS 


By H. Wop anp P. Faxtr 
University of Uppsala, Sweden 


In the difference between a statistical estimate and the corresponding theoreti- 
cal value it is customary to distinguish between the sampling error, which arises 
because the estimate is based on a finite sample from a specified population, 
and the specification error, which arises if the population is not correctly described 
in the assumptions that form the basis of the estimation method. It is easy to 
see that the specification error of a least squares regression coefficient will be 
small if (A) the disturbance term is small, or if (B) the disturbance is nearly 
uncorrelated with the explanatory variables. The proximity theorem ((1], 
Theorem 12. 1.3; see also p. 37) states the simple fact that conditions (A) and 
(B) strengthen each other, to the effect that if they are fulfilled up to magnitudes 
of the first order, the specification error will be small of the second order. The 
present note gives limits for the unspecified constant that is involved in the 
proximity theorem. 

We shall first prove an auxiliary lemma which contains the proximity theorem, 
and from which the limits sought for will be deduced by way of a corollary. It 
is sufficient for our purpose to consider large samples, so as not to place empha- 
sis on the difference between observed and theoretical values for variances, cor- 
relation coefficients, etc. 

LemMMA. Given the theoretical relation 


(1) y=Prat---+Pmte 


suppose: (a) the disturbance ¢ has zero expectation and finite variance o°(t), but is 
otherwise arbitrary, and (b) none of the explanatory variables x, ,--- , x, is iden- 
tically linear in the other ones. Let 


(2) y = bya, +--+ + data + 2 
be the least squares regression of yon %,°-* , %%. Then 


o(¢) 
=~ Bel pee 
o(z)v 1 — R? 
where R; = Riqe,..,i—1,141,---,a) 18 the multiple correlation coefficient of x; and x; , 
* 5 Vir, Ti4,°°* » Th, 
Proor. The assumptions of the lemma lead us to regard the joint distribution 
of 2, °°: , 2, as given and the distribution of ¢ as unspecified. Hence if p(£, «) 


denotes the correlation coefficient of — and y, the coefficients 


pPij = P(X, 75), J 1, --- 


Received May 18, 1956 





266 H. WOLD AND P. FAXER 


are given, with p;; = 1, while the 
?;~= p(x, »$), 


are unspecified. Then if (w , --- , wa) are coordinates of h-dimensional Euclidean 
space, it is known ((2]; cf. also [1], Theorem 12.3.5) that the point r = (mn, --- 
r,) lies inside or on the boundary of the ellipsoid defined by 


, 


| pn shies 
(a ne 


| 


Writing P for the determinant 


and P;; for its cofactors, we further remark that (b) implies 


1 — Ri = P/P;; > 0, 


It will suffice to verify (3) for i = 1. The method of least squares regression 
gives 


h 
(5) b, — Bi, = a(é) » ri P,/o(a) 


Next, b; — 8; being proportional to the linear form 
f = Pun +--+ + Put 


we ask for the extremum of f, and this clearly occurs for a point r on the bound- 
ary of the ellipsoid (4), that is, for a point which satisfies 


(6) > P rir; = P. 
ije 
We shali accordingly seek the extremum of 
F=f+ sf > Pyrit; — P|, 

i= 

where \ is a Lagrange multiplier. This leads to the system 
Pi; + 2a > P,;r; = 0, 
f= 


which combined with (6) gives the solution 


n= ++/1 — R?; fo=--> =r = 0. 


Substituting in (5), we infer that the extremum of the specification error is given 
by the right-hand member of (3). The lemma is proved. 





SETS OF MEASURES 267 


We shall now render the introductory conditions (A)-(B) exact. First, we write 
(A) o(f) S €-o(x,), t=1,---,h, 


where « 2 0. Then if ¢ is small, the disturbance ¢ is small in the sense that its 
standard deviation is small relative to the standard deviations of the explana- 
tory variables. To give condition (B) a convenient form we observe that for 
given r,;,--- , rs there is a point (rj, --- , re) on the boundary of the ellipsoid 
(4) and a proportionality factor «’ with 0 < ¢ S 1 such that 


(B) r= é-r, ; beam de 


Then if ¢’ is small, the correlations r; are small in the sense that the point (7; , 
- , Tn) lies near the centre of the ellipsoid. 
Thus prepared, we obtain the following 
Coro.uary. On conditions (a) and (b) of the lemma, we have 


|b; — Bi | S ee /V1 — R%, 5 domo A, 
where « and «' are defined by (A)-(B). 
Hence if « and ¢’ are of small order the specification error of the regression 
coefficients b; will at most be of order ee’. 
In the special case of one explanatory variable, h = 1, we have R, = 0 and 
|b, — 8: | S ee’. For example, if o(f) = }e(x,;) and r; = p(m,, £) = 4, the 
specification error of b; cannot exceed 0, 04. 


REFERENCES 


1. H. Wop IN Association wits L. Jurten, Demand Analysis: A Study in Econometrics, 
Geber, Stockholm, 1952; and John Wiley and Sons, New York, 1953. 

2. H. Wo xp, ‘‘A theorem on regression coefficients obtained from successively extended sets 
of variables,’’ Skand. Aktuarietids., Vol. 28 (1945), pp. 181-200. 


a 


SETS OF MEASURES NOT ADMITTING NECESSITY AND 
SUFFICIENT STATISTICS OR SUBFIELDS': ? 


By T. 8. Prrcuer? 


Let X be the interval from 0 to 1 and F the field of Borel sets on X. For every 
x & 3, let m, be the probability measure assigning probability 4 to the point z 
and probability 4 to the point (z + 4) and let F, be the subfield of F consisting 
of all Borel sets which contain both x and (x + 4) or else neither. Then if M 
is a set of probability measures consisting of all m., 0 S x < 4 and some 
measures assigning probability 0 to every point, the only set of m-measure zero 


Received May 4, 1956; revised June 12, 1956. 

' The research in this document was supported jointly by the Army, Navy, and Air 
Foree, under contract with the Massachusetts Institute of Technology. 

? For definitions of these concepts, see references [1] and [2]. 

3 Staff Member, M.I.T., Lincoln Laboratory. 





268 T. S. PITCHER 


for every m in M is the empty set. Each F, is a sufficient subfield for M with 
conditional expectations defined by 


E(f | F:)(y) = fy) if y ~ xor (x + 3), 


= f(z) + fe + 4)] ify = zor (@ + 4). 


If M has a best sufficient subfield F’, then F” includes only sets having the prop- 
erty that for every x < }, either both x and (x + 34) are in the set or neither is, 
so that E(f | F°) (x) = E(f| F°) (x + 3). In particular if we write f, for the 
characteristic function of the point z, then E(f,| F’) = 43(f2 + fess) if a < 3. 
If f is the characteristic function of the interval from 0 to 3, E(f | F°) = E(f. | F°) 
for every x < } since f = f,, so that E(f| F°) = 3 everywhere; and since 


2 
2 


f E(f | F°) dm, = }[E(f| F’) (x) + E(f| FP’) (« + 3)) = Sfdm. = } we have 
E(f | F°) = } everywhere. Hence for any other m in M, fi dm = fifdm = 
f E(f | F°) dm = 3f¢ dm = 3, which is clearly not true in general. 

Another example of the same type can be made up as follows: let A; and A, 
be subsets of the intervals from 0 to 4 and 3 to 1 respectively, let ¢ be a 1:1 
map of A; onto Az, and suppose that A; is a Borel set, but A, is not.* For every 
x in A,, let m, assign probability } to x and probability 3 to ¢(z); for every z 
not in A, or Ag, let m, assign probability 1 to z. If M consists of all these m. , 
then, as above, the subfields /’, of sets containing either both z and ¢(z) or neither 
are sufficient for M. Moreover, if f; and fe are the characteristic functions of 
A; and A; respectively and F” is a sufficient subfield contained in all the F,, 
then, necessarily, E(f, | F’) = (4) (f: + fe), which is not measurable. 

These examples do not have necessary and sufficient statistics. Each of the 
fields F, is induced by the statistic T, defined by T(x) = T.(x + 4) = x and 
T.(y) = y, otherwise. The 7, are sufficient since the F, are. Since the only M- 
null set is the empty set, a necessary and sufficient statistic T° would have to be 
exactly a function of each 7’, and hence would induce a sufficient subfield F° con- 
tained in all the F, . This has already been shown to be impossible. 

REFERENCES 
[1] E. L. LeuMANN Anp H. Scuerre, ‘‘Completeness, similar regions, and unbiased estima- 
tion, Part I,’’ Sankhya, Vol. 10 (1950), pp. 305-340. 
[2] R. R. Banapour, ‘Sufficiency and statistical decision functions,’?’ Ann. Math. Siat., 


Vol. 25 (1954), pp. 423-462. 


[3] C. Kuratowsk1, Topologie I, Deuxieme Edition, Monografie Matematyczne, Warszawa, 
1948. 


‘ The existence of such a map is proved in [3]. 





NEWS AND NOTICES 


CORRECTION TO “A NOTE ON COMBINED INTERBLOCK 
AND INTRABLOCK ESTIMATION IN INCOMPLETE 
BLOCK DESIGNS” 


By D. A. Sprorr 
University of Toronto 


In the paper ‘‘A note on combined interblock and intrablock estimation in 
incomplete block designs,’”’ Ann. Math. Stat. Vol. 27 (1956), pp. 633-641, the 
third sentence “Thus it is possible to estimate variety ...” is obviously false 
and should be deleted. The correction does not affect the rest of the paper. 


a Rn 


NEWS AND NOTICES 
Readers are invited to submit to the Secretary of the Institute news items of interest 


Personal Items 


Professor Kenneth J. Arrow, Stanford University, is on leave for the academic 
year 1956-7 as Fellow of the Center for Advanced Study in the Behavioral 
Sciences. 

George J. Auner has been employed as Staff Statistician, Applied Mathematics 
Section, Technical Services with the Jones and Laughlin Steel Corporation, 
Pittsburgh, Pennsylvania. 

Dr. John L. Bagg has become Assistant Professor in the Department of Mathe- 
matics of Florida State University and is teaching mathematical statistics. 

Dr. Geoffrey Beall has resigned as Chairman of the Department of Statistics 
at the University of Connecticut and is now connected with the Division of 
Manufacturing, Gillette Safety Razor Company in Boston, Massachusetts. 

Stuart A. Bessler, formerly a graduate student and instructor of economics 
at the University of Minnesota, will continue his graduate studies at Stanford 
University where he will also be associated with the Applied Mathematics and 
Statistics Laboratory. 

Allan Birnbaum has been appointed visiting Lecturer in Mathematics at 
Imperial College of Science and Technology (University of London) for the year 
1956-1957. 

LeRoy 8S. Brenna has accepted a Graduate assistantship from V.P.I. to take 
work leading to a Doctoral degree in Statistics. He is presently on leave of ab- 
sence from Eastman Kodak. 

D. R. Cox of the University of North Carolina, has been appointed to a Reader- 
ship in Statistics, University of London at Birkbeck College. 

Herbert T. David has been appointed Assistant Professor in the Statistical 
Laboratory of Iowa State College, Ames, Iowa. 

Bernard J. Derwort received his Ph.D. at St. Louis University in June 1956 
and is now a senior research engineer at North American Aviation, Inc., Co- 
lumbus, Ohio. 





270 NEWS AND NOTICES 


David B. Duncan resigned his position in the Agricultural Experiment Station 
and College of Agriculture at the University of Florida to join the staffs of the 
Departments of Statistics and Biostatistics at the University of North Carolina 
on September 1, 1956. 

Meyer Dwass, who was a visiting faculty member of the Department of 
Statistics, Stanford University, has returned to his post at the Department of 
Mathematics, Northwestern University. 

Dr. Robert F. Fagot has accepted a position as Asst. Professor of Psychology 
at the University of Oregon, Eugene, Oregon. 

Edwards N. Fiske received his Masters degree from V.P.I. in June 10, 1956 
and is now working for North American Aviation, Inc., Columbus, Ohio. 

Dr. Thornton C. Frye, Assistant to the President of Bell Telephone Labora- 
tories, retired from the Laboratories on October 1st, after completing more than 
40 years of Bell System service. He is a Fellow of the Institute of Mathematical 
Statistics. He is best known as an expert on the mathematical theory of prob- 
ability, and especially its application to various communication problems. He 
plans to engage in consulting work. 

Donald A. Gardiner has joined the staff of the Mathematics Panel of the Oak 
Ridge National Laboratory, Oak Ridge Tennessee, in the capacity of Statistician. 
His responsibilities include consulting with engineers engaged in research for 
the Atomic Energy Commission. 

Charles E. Gates, having completed a tour of duty in the U.S. Army, has ac- 
cepted the position of Station Statistician with the Agricultural Experiment 
Station at the University of Minnesota. 

Randolph R. M. Geoghagen has, since August, been appointed as a Research 
Engineer at Rocketdyne, a division of North American Aviation, Inc., Los 
Angeles, California. 

Earl L. Green has resigned as professor of zoology at the Ohio State University 
to become director of the Roscoe B. Jackson Memorial Laboratory, Bar Harbor, 
Maine, succeeding Dr. C. C. Little who founded the Laboratory in 1929. 

E. J. Gumbel, Adjunct Professor, Columbia University, taught Mathematical 
Statistics at the Free University of Berlin at the Summer Session of 1956. He has 
been restored to the position of University Professor by the state of Baden 
(Germany) and has been retired with this rank. (Gumbel was the first German 
professor dismissed by the Nazis). 

Bernard Harris returned to his position as mathematician for the Department 
of Defense after a year’s leave at the Applied Mathematics and Statistics Labora- 
tory of Stanford University. 

C. Roger Heimlich has left his position as statistician with the U. 8. Public 
Health Service, Communicable Disease Center, Atlanta, Georgia and joined the 
staff of the Cummins Engine Company, Columbus, Indiana, as quality control 
analyst. 

G. Ronald Herd received a Ph.D. degree in statistics from the Department 
of Statistics, lowa State College, in December 1956; his doctoral thesis dealt 





NEWS AND NOTICES 271 


with “Estimation of the parameters of a population from a multi-censored 
sample.” He is remaining with Aeronautical Radio, Inc., in Washington, D. C., 
as Consultant in the Reliability Research Department. 

W. T. Hirnyck has accepted the position of Quality Control Supervisor with 
the Continental Can Company, Inc., Bond Crown and Cork Division, Wilming- 
ton, Delaware. 

Dr. John F. Hofmann, after resigning from his position at Corona and serving 
as a consultant in statistics at North American Aviation and Douglas Aircraft, 
has joined the staff of Systems Laboratories Corp. of Sherman Oaks, California. 
SLC is a recently formed company of scientists, mathematicians and engineers 
serving defense industries as consultants. 

William C. James has been transferred from Caracas, Venezuela to San Salva- 
dor, El Salvador where his work will be the same as that before, technical advisor 
in Vital and Public Health Statistics under the program of the United States 
Operations Mission to El Salvador. 

M. V. Johns, Jr. having received his Ph.D. from Columbia University, is now 
employed as a Research Associate at the Applied Mathematics and Statistics 
Laboratory, Stanford University. 

Mortimer B. Keats has resigned as Head, Analysis Division, Quality Sur- 
veillance Department, U. 8. Naval Powder Factory, Indian Head, Maryland, 
to accept a position as Statistician, with the Light Military Electronics Equip- 
ment Department, General Electric Company, Utica, New York. 

Melville R. Klauber is now serving as a 2nd Lt. in the Air Force at Ellington 
Air Force Base. 

William Kruskal has returned to the Committee on Statistics, University of 
Chicago, after a year’s visit with the Department of Statistics, University of 
California, Berkeley. 

Thomas E. Kurtz, who received his Ph.D. from Princeton University in June, 
1956, is now an instructor in the Department of Mathematics, Dartmouth 
College. 

Alfred Lieberman, formerly with the Advance Development Division of the 
AVCO Mfg. Corp., has joined the Statistical Engineering Section of the Bureau 
of Ships as mathematical statistician in charge of experimental design. 

George F. Lunger has resigned his position as Senior Analyst on the Ford 
Motor Company Quality Control Staff to accept a position as mathematician 
in the Advanced Applications Section of the Univae Division of Sperry Rand 
Corporation located at St. Paul, Minnesota. 

Richard B. McHugh is a Visiting Lecturer in the Biostatistics Division, Medi- 
cal College, University of Minnesota during 1956-57, on leave from his position 
as associate professor of psychology and statistics at Iowa State College. 

G. L. Marcus is currently employed at the Armour Research Laboratories, 
Statistical Research Section, Chicago, Illinois. 

Upon completing requirements for a doctorate in statistics, Carl E. Marshall 
returned last fall to his position as professor of mathematics and director of the 





272 NEWS AND NOTICES 


Statistical Laboratory at Oklahoma A&M College. He was granted the Ph.D. 
degree by Iowa State College in December; his dissertation was entitled ‘‘Cost 
control of sample surveys by two-step designs.” 

W. Jay Merrill, Jr. has joined the General Electric Company in the Statistical 
Methods Section of The General Engineering Laboratory as a consulting statis- 
tician for engineering and administrative applications. 

Donald F. Morrison has reported for active duty with the U. 8. Public Health 
Service, Commissioned Corp, National Institute of Mental Health, Bethesda, 
Maryland, as a Mathematical Statistician. Prior to this he was a graduate 
student at the University of North Carolina and staff member, Lincoln Labora- 
tory, Massachusetts Institute of Technology. 

Milton Morrison, formerly Research Engineer-Statistician with the Experi- 
mental Towing Tank, Stevens Institute of Technology, is now with Vitro Labora- 
tories, West Orange, New Jersey. His duties include work in statistics and opera- 
tions research. 

Eliezer Naddor, formerly of the Operations Research Group, Case Institute 
of Technology, has been appointed Assistant Professor of Industrial Engineering 
at the School of Engineering, Johns Hopkins University, Baltimore, Maryland. 

John D. Neff received his Ph.D. at the University of Florida in August, 1956, 
and is now an instructor in the Department of Mathematics at Case Institute of 
Technology. 

Dr. M. G. Neurdenburg, D.P.H., Med. Off. of Health (Statistics), has moved 
from Amsterdam to Therese Schwartzestraat 20, The Hague, Netherlands. 

Dr. Wesley L. Nicholson has recently accepted a position as research statis- 
tician with the Hanford Atomic Products Operation of the General Electric 
Company, Richland, Washington. 

Dr. Bernard Ostle, Professor of Mathematics at Montana State College has 
been appointed Director of the recently formed Statistical Laboratory. 

Emanuel Parzen, formerly Assistant Professor of Mathematical Statistics at 
Columbia, is now Assistant Professor of Statistics at Stanford. 

Roger S. Pinkham is on leave of absence for a year from Duke University and 
is spending the year at Princeton as a research assistant. 

Frank Proschan is on leave from Sylvania’s Missile Systems Laboratory to 
study for a Ph.D. in statistics at Stanford University. 

Ronald Pyke, having received his Ph.D. at the University of Washington, has 
accepted a Research Associate’s position at the Applied Math. and Stat. Lab. of 
Stanford University. 

Bayard Rankin has accepted a position as Assistant Professor in the Depart- 
ment of Mathematics, Case Institute of Technology. 

Herman Ravitch, who is now a graduate student in the statistics department 
of the University of Pennsylvania, is now employed as a mathematical statistician 


on Project Caramu, an Air Force project contracted by the University of Penn- 
sylvania. 





NEWS AND NOTICES 273 


Dr. Lowell J. Reed retired from the presidency of The Johns Hopkins Uni- 
versity on October 1, 1956, being succeeded by Dr. Milton 8. Eisenhower. Dr. 
Reed’s future address will be Shelburne, New Hampshire. 

John H. Rosenboom, formerly at Johns Hopkins Operations Research Office, 
has joined the Operations Research Department of Booz, Allen and Hamilton 
in Chicago. 

Marvin A. Schneiderman has been appointed Biometric Consultant to the 
Cancer Chemotherapy National Service Center, a cooperative group sponsored 
by the United States Public Health Service, the Atomic Energy Commission, 
the Veterans Administration, the Food and Drug Administration, the American 
Cancer Society, and the Damon Runyon Fund. 

Professor Jack Silber has returned to Roosevelt University after spending the 
spring and summer semesters as consultant to the Assistant for Operations 
Analysis, United States Air Force. 

Morris Skibinsky, on leave of absence from Purdue University this year, is 
a visiting Assistant Professor in the Department of Statistics at Michigan State 
University. 

James E. Thompson is on a fellowship from his position as mathematician with 
the Defense Department and is doing graduate work with the Department of 
Statistics at Stanford University for one year. 

Dr. B. D. Tikkiwal has joined Karnatak University Dharwar as Head of the 
Dept. of Statistics since June, 1956 on his return from the U.S.A. During his 
stay in the United States he completed his Ph.D. at the University of North 
Carolina, Raleigh, N. C., in August 1955 and then worked for some time in the 
Institute of Statistics at Raleigh as Assistant Statistician. 

Howard G. Tucker, formerly assistant professor of mathematics at the Uni- 
versity of Oregon, is now Assistant Professor of Mathematics at the University 
of California, Riverside. 

M. C. K. Tweedie, who has been on the faculty of the Virginia Polytechnic 
Institute for the past four years, has taken a position as Temporary Lecturer in 
Mathematical Statistics at the University of Manchester, England. 

W. Allen Wallis has become Dean of the Graduate Business School at the 
University of Chicago, but is on leave of absence for 1955-56 as a Fellow of the 
Ford Center for Advanced Study at Stanford. He is continuing as Editor of the 
Journal of the American Statistical Association. 

Ernest Charles Weigel, who received an M.S. in applied statistics at Rutgers 
University, New Brunswick, New Jersey, has accepted the position of Accountant 
Analyst with Western Electric Company, Inc., Kearny, New Jersey. 

Dr. Max A. Woodbury has accepted an appointment as Professor of Mathe- 
matics, College of Engineering, New York University, New York 53, N. Y. 

Evan J. Williams of the Division of Mathematical Statistics, Commonwealth 
Scientific and Industrial Research Organization, Australia, has taken up a posi- 
tion as Visiting Research Professor with the Institute of Statistics, University 
of North Carolina, Raleigh, N. C. 





NEWS AND NOTICES 


New Members 


The following persons have been elected to membership in the Institute 
August 16, 1956 to November 7, 1956 


Alcaide, Angel, Doctor en Ciencias Economicas (University of Madrid), Estadistico Fa- 
cultativo, Instituto Nacional de Estadistica, Calle Ferras, No. 41, Madrid, Spain, 
Calle de la Ruda, No. 17, Madrid, Spain. 

Allen, George H., B.S. Math. (Boston College), Statistician, Missile and Radar Division, 
Raytheon Manufacturing Co., Maynard, Massachusetts, 79 Nahanton Ave., Milton 
86, Mass. 

Anderson, William B., B.A. (UCLA), Mathematician, The Rand Corp., Santa Monica, 
California, 34 Forest Ave., Natick, Mass. 

Benepe, Otis J., Ph.D. (University of Washington), Technician, Ramo-Wooldridge Corp., 
5730 Arbor Vitae, Los Angeles 45, Calif. 

Bleicher, Edwin, M.S. (VPI), Technical Staff Member, Bell Telephone Laboratories, Inc., 
555 Union Blvd., Allentown, Pa. 

Bundgaard, Svend, Professor, Matematisk Institut, Universitetsparken, Aarhus, Denmark. 

Campbell, Robert L., M.Ed. (Louisiana State Univ.), Instructor, Teachers College, Univ. 
of Hawaii, Honolulu, T. H. 

Ciminera, Joseph L., B.Sc. (Phila. College Pharmacy and Science), Biometrician and 
Lecturer in Stat., Merck Sharp and Dohme Research Laboratories, West Point, Pa., 
Villanova Univ., Pa. 

Eissner, Robert M., M.A. (University of Delaware), Analytical Statistician, Field Artillery 
Ammo. Section, Surv. Branch, Ballistic Research Laboratories, Aberdeen Proving 
Ground, Maryland, P.O. Box 446, Newark, Delaware. 

Epling, Mary L., B.S. (American University), Mathematician, National Bureau of Stand- 
ards, Washington 25, D. C., 5142 Nebraska Ave., N. W., Washington 8, D.C. 

Fagen, R. E., Ph.D. (Univ. of Minnesota), Mathematician, Ensign USNR, Department of 
Defense, Washington, D. C., 4532 Fairfield Drive, Bethesda, Maryland. 

Franco, Antonio Jose Lopez, (Actuario-Universidad Central de Venezuela) Jefe Departa- 
mento Incendio, Roboy Fianzas, Seguros La Metropolitana 8.A., Edificio Rivero, 
Ibarras a Pelota—ler piso, Apartado Postal 2197, Caracas, Venezuela. 

Fujisawa, Hideo, B.S. (Kyushu University), Graduate, Mathematical Institute, Faculty 
of Science, Kyushu University, Fukuoka, Japan. 

Gémez, Régulo Carmona, Dr. en Ciencias Estadisticas y Actuariales (Universidad Cen- 
tralde Venezuela), Jefe del Dpto de Estadistica, Seguros La Metropolitana S.A., Edif. 
Rivero-Ibarras a Pelota No. 14. ler piso Seguros La Metropolitana S.A., Aptdo. Postal 
No. 2197, Caracas, Venezuela. 

Green, Thomas F., M.S. (Lehigh Univ.), Graduate Student, Univ. of North Carolina, 
Chapel Hill, N. C., 208 Glenburnie St., Chapel Hill, N.C. 

Hall, Robert A., B.S. (Univ. of Washington), Teaching Assistant, University of Wash- 
ington, Seattle 5, Washington, 4220 12 Ave., Seattle 5, Washington. 

Hannye, Nancy Lee, A.B. (Wilkes College), Research Assistant, Cornell Univ., Ithaca, 
N. Y., Mathematics Dept. White Hall, Cornell Univ., Ithaca, N.Y. 

Hansen, Erik, M.A. (University of Aarhus), Research Assistant, Institute of Statistics, 
University of Copenhagen, Sct. Pederstraede 19, Copenhagen, K, Daemningen 37, 
Kobenhavn, Valby, Denmark. 

Iglesias G., Tomas, Ph.D. (University of Madrid), Researcher, Junta de Energia Nuclear, 
Serrano 121, Madrid, Spain. 

Itkin, Arthur G., M.S. (New York Univ.), Statistician, Merck and Co., Inc., Rahway, 
N. J., 925 Jersey Avenue, Elizabeth, New Jersey. 

Jensen, Ernst Lykke, Cand. Polit. (University of Copenhagen), Research Assistant, Statis- 





NEWS AND NOTICES 275 


tical Institute, University of Copenhagen, 19 Sankt Pedersstride, Copenhagen K., 
Thorupgard Alle 10, Vanlgse, Denmark. 

Kesten, Harry, Ph.D. (University of Amsterdam), Assistant, Statistical Department, 
Mathematical Centre, Amsterdam, 24 Boerhavestr, Holland, Room No. 453, Cascadilla 
Hall, Cornell University, Ithaca, N. Y. (academic year 1956-57). 

Keyser, C. J., M.S. (Stevens Inst. of Tech.), Technical Staff Member, Quality Assurance 
Department, Bell Telephone Laboratories, Inc., 463 West St., New York 14, N. Y. 
Koehler, Truman L., Jr., B.S. (Muhlenberg College), Statistician-Quality Control Engi- 

neer, Sylvania Electric Products, Inc., Towanda, Penna, 

Kupferberg, Stanley J., B.S. (Queens College), Student, Univ. of Chicago, Chicago, Illinois, 
1418 E. 60th Street, Chicago 37, Illinois. 

Meisner, Morris J., B.S. (City College of New York), Engineer, Curtiss-Wright Corpora- 
tion, Wright Aeronautical Division, Wood-Ridge, New Jersey, 713-15 East 9th St., 
New York 9, New York. 

Mendenhall, William, 3rd., M.S. (Bucknell Univ.), Student, N. C. State College, Depart- 
ment of Experimental Statistics, Raleigh, North Carolina. 

Morimoto, Haruki, B.S. (Tokyo University), Graduate Student, Tokyo University, Moto- 
fujimachi 1, Bunkyo, Tokyo, Japan, 782 Saginomiya, 2-chome, Nakano, Tokyo, Japan. 

Neuburg, E. P., M.S. (Univ. of Chicago), Mathematician, Dept. of Defense, National 
Security Agency, Washington 25, D. C., Route 2, Silver Spring, Maryland. 

Potthoff, Richard F., B.A. (Swarthmore College), Graduate Student, Dept. of Statistics, 
Univ. of North Carolina, Chapel Hill, N. C. 

Proctor, Charles H., M.A. (Michigan State Univ.) Instructor, Dept. of Statistics, Michigan 
State University, East Lansing, Michigan. 

Reveron O. C., Licenciado en Ciencias Estadisticas (Universidad Central de Venezuela), 
Banco Central de Venezuela, Esquina de Sta. Capilla, Caracas, Venezuela. 

Roberts, Donald M., M.A. (University of Illinois), Research Assistant, Applied Mathe- 
matics and Statistics Lab., Stanford University, Stanford, California. 

Rosenfeld, Abraham, M.S. (M.I.T.), Analytical Statistician, Section Chief, Surveillance 
Branch, Weapon Systems Lab., B.R.L. Aberdeen Proving Ground, Maryland, R.D. 
No. 2, Aberdeen, Maryland. 

Shah, Vishwa Parakram, M.S. (N. C. State College), Project Director, Life Insurance 
Agency Management Association, 855 Asylum Ave., Hartford 5, Conn. 

Shapiro, J. M., Ph.D. (University of Minnesota), Asst. Prof. of Math., Ohio State Uni- 
versity, Columbus 10, Ohio, Department of Mathematics. 

Shelly, Maynard W., Jr., Ph.D. (University of Wisconsin), Research Associate, 259 Nata- 
torium Laboratory of Aviation Psychology, Ohio State University, Columbus 10, Ohio, 
1824-C Northwest Court, Columbus 12, Ohio. 

Shelton, John R., B.A. (New York University), Division Chief, Operations Standard 
Division, The Port of New York Authority, 111 Eighth Avenue, New York 11, New 
York. 

Shirafuji, Michie, B.S. (Kyushu University), Graduate Student, Kyushu University, 
Faculty of Science, Fukuoka, Japan. 

Tanaka, Hiroshi, B.S. (Kyushu University), Graduate, Kyushu University, Faculty of 
Science, Fukuoka, Japan. 

Theil, Henri, Ph.D., (University of Amsterdam), Professor, Director of the Econometric 
Institute, Netherlands School of Economics, Pieter de Hoochweg, Rotterdam, The 
Netherlands. 

Toda, Hideo, Bach. of Eng. (Univ. of Tokyo), Graduate Student, Department of Applied 
Physics, Faculty of Engineering, University of Tokyo, Hongo, Tokyo, Japan, 6-432, 
Omiyamae-tyo, Suginami-ku, Tokyo, Japan. 

Tung, Irene, M.A., (University of Michigan), Instructor, Dept. of Mathematics, Uni- 





276 NEWS AND NOTICES 


versity of Houston, Cullen Boulevard, Houston 4, Texas, 4380 Blodgett, Houston 4, 
Texas. 


Twery, Raymond J., A.M. (Univ. of Ill.), Mathematics Asst., Univ. of Illinois, Urbana, 
Illinois, 1107 W. Green (Apt. 621), Urbana, Illinois. 

Vaart, H. Robert van der, D.Sc. (State University of Leiden) Scientific head-officer, Science 
Faculty of Leiden University, and Docent in the same Faculty, Zoological Laboratory, 
Kaiserstraat 63, Leiden, Netherlands. 

Ward, Ulysses V., B.A. (Wayne University), Social Worker, Michigan State Department of 
Social Welfare, Wayne County Bureau of Social Aid, 32 Duffield, Detroit 1, Michigan, 
3903 Montclair Avenue, Detroit 14, Michigan. 

Yanes Castro, Ana Isabel, Licenciado en Ciencias Estadisticas (Universidad Central de 
Venezuela) Ministerio del Trabajo (Direccion de Prevision Social), Edificio Sur- 
Avenida Bolivar, Caracas, Venezuela, Sur 9 bis No. 2-Caracas, Venezuela. 


a 


CIVIL SERVICE APPLICATIONS 


The United States Civil Service Commission announces that applications are 
being accepted for Engineer and Physical Science positions for duty in activities 
of the Potomac River Naval Command in and near Washington, D. C., and in 
the Engineer Center, U.S. Army, Fort Belvoir, Virginia. The beginning salaries 
range from $4,480 to $11,610 a year. 

To qualify for the lower grade positions, applicants must have had appropriate 
education or experience or a combination of both. Additional professional ex- 
perience is required for the higher grades. 

Further information and application forms may be obtained at many post 
offices throughout the country, or from the U. 8. Civil Service Commission, 
Washington 25, D. C. Applications must be filed with the Executive Secretary, 
Board of U. S. Civil Service Examiners for Scientific and Technical Personnel, 
Potomac River Naval Command, Building 72, Naval Research Laboratory, 
Washington 25, D. C. They will be accepted until further notice. 


(oe 


NEW DEPARTMENT OF STATISTICS AT MICHIGAN STATE 
UNIVERSITY 


The recently-formed Department of Statistics at Michigan State University 
is now tentatively organized. Following is the present staff roster; others may 
be added: Professors, W. D. Baten and Leo Katz; Associate Professors, K. J. 
Arnold and Ingram Olkin; Visiting Associate Professor, Gopinath Kallianpur; 
Assistant Professors, J. F. Hannan, C. H. Kraft and A. G. Laurent; Instructors, 
C. H. Proctor and John Van Dyke. 

Professor Hannan is on leave for 1956-57 at Stanford University. Professor 
Kallianpur is visiting at Michigan State University on leave from the Indian 
Statistical Institute, Calcutta. Professor Katz has been named Head of the 
Department, effective September 1, 1956. Professor Olkin is returning from a 
year’s leave, 1955-56, at the University of Chicago and Columbia University. 





NEWS AND NOTICES 277 


Professor Sir Ronald A. Fisher will spend the Fall Quarter, 1957 in the De- 
partment of Statistics as Visiting Distinguished Professor. 


Cn RR 


STATISTICAL CENTER OFFICIALLY OPENED 
TO FOREIGN STUDENTS 


In the agreement between the United Nations and the Philippine Govern- 
ment, covering the establishment and maintenance of the Statistical Center, 
University of the Philippines, the Philippine Government was to make the Center 
available to holders of United Nations fellowships or other persons from abroad 
interested in statistical training. Upon the implementation of this provision, it 
was planned that the U.N. would invite the countries in Southeast Asia to send 
students to the Statistical Center for training in both the theoretical and applied 
fields of statistics. 


$$$ 


UNIVERSITY OF MINNESOTA GRADUATE PROGRAM 


The Graduate School of the University of Minnesota announces for the second 
consecutive year a four-year graduate training and fellowship program for pros- 
pective research workers in the behavioral sciences, leading to the Ph.D. degree 
in one of the following behavioral disciplines: economics, political science, social 
anthropology, psychology or sociology. Competition for 1957-58 appointments 
is open to highly qualified students who have not completed previous graduate 
training but who expect to receive their B.A. degree before September 1957. 
Seeking to provide training for those who will be concerned with research re- 
quiring crucial contributions from two or more fields of social science knowledge, 
the program permits both specialization in a single social science field and mastery 
of a common core of subject matter from several behavioral disciplines. Success- 
ful applicants study on a four-year stipend schedule of alternating fellowship 
and assistantship support. Detailed information and application forms may be 
secured from Mrs. Ann Cornog, 404 Johnston Hall, University of Minnesota, 
Minneapolis 14, Minnesota. 


SEER ine 
SUMMER SESSIONS AT BERKELEY, CALIFORNIA 


The 1957 summer program in the Department of Statistics of the University 
of California, Berkeley, California, will consist of two sessions: June 17 to July 
27 and July 29 to September 7. The faculty of the summer sessions will include 
Professor D. A. Edwards of Yale University, Dr. Walter L. Smith of the Univer- 
sity of Cambridge, England, Dr. M. M. Brooke, Dr. W. J. Hall, Dr. R. E. Serfling 
and Dr. M. B. Willis of the Communicable Disease Center, Public Health Service, 
Atlanta, Georgia, Dr. Nathan Mantel of the National Institutes of Health, 
Bethesda, Maryland and Professors Evelyn Fix, J. Neyman and R. A. Wijsman 
of the Department of Statistics of the University of California, Berkeley. 

The program includes two of the usual undergraduate courses in each session, 





278 NEWS AND NOTICES 


adapted primarily to meet the needs of students transferring from other centers 
who would like to undertake advanced study at the University of California 
during the regular academic year. Also a research seminar in statistical problems 
of health will be conducted in close collaboration with the representatives of the 
Public Health Service and National Institutes of Health. The discussions will 
begin with the medico-biological side of the problem and will then be followed 
by the statistical treatment. 


SERIE cee 


SUMMER OFFERINGS IN STATISTICS AT IOWA STATE COLLEGE 


The Department of Statistics at Iowa State College will offer six applied 
courses in statistical theory and methods in its two 1957 summer sessions. These 
courses are planned primarily for graduate students or research workers with 
limited mathematical backgrounds who wish to use statistical techniques intelli- 
gently for application to other fields. In addition, a course on special topics in 
theoretical or applied statistics may be studied at the graduate level. Senior staff 
members will be available during most of the summer for consultations on re- 
search or special problems. 

Students may register for either or both of the six-week summer sessions: 
June 18 to July 24 and July 24 to August 30. The complete list of statistics 
offerings for the first session is as follows: Stat. 401, “Statistical Methods for 
Research Workers” (at the level of Snedecor’s Statistical Methods); Stat. 447, 
“Statistical Theory for Research Workers” (mainly theory of experimental sta- 
tistics); Stat. 599, “Special Topics;” and Stat. 699, ‘“Research’’. In the second 


session will be offered Stat. 402, a continuation of 401; Stat. 448, a continuation 
of 447; two courses in applied methods which are more specialized—Stat. 411, 
‘Experimental Designs for Research Workers,” and Stat. 421, ‘‘Survey Designs 
for Research Workers;” and finally Stat. 599 and 699. Additional information 
may be obtained from T. A. Bancroft, department head and Director, Statistical 
Laboratory, Iowa State College. 


————— 


SOUTHERN REGIONAL GRADUATE SUMMER SESSIONS 
IN STATISTICS 


The fourth Southern Regional Graduate Summer Session in Statistics will be 
held during the summer of 1957 from June 12 through July 20 at the Virginia 
Polytechnic Institute, Blacksburg, Virginia. Each course carries approximately 
five quarter hours of graduate credit. The program may be entered at any session, 
and consecutive courses will be offered in successive summers. The summer 
work in statistics may be applied towards residence requirements at any one of 
the cooperating institutions, as well as certain other institutions, in partial ful- 
fillment of residence requirements for graduate degrees. Faculty and courses 
include E. J. Williams (analysis of variance), Daniel B. DeLury (sampling of 
biological populations), J. L. McHugh (laboratory accompanying Professor 
DeLury’s course), Willard O. Ash (statistical inference), L. 8. Brenna (engineer- 





NEWS AND NOTICES 279 


ing statistics), Ralph A. Bradley (rank order statistics), C. W. Clunies-Ross 
(probability), John E. Freund (stochastic processes), Rudolf J. Freund (sam- 
pling), Boyd Harshbarger, Clyde Y. Kramer (statistical methods), and R. Lowell 
Wine (least squares). Seminars will be conducted by many prominent statisti- 
cians. The total tuition fee will be $38.00 for the six-week term. Inquiries should 
be addressed to Boyd Harshbarger, Head, Department of Statistics, Virginia 
Polytechnic Institute, Blacksburg, Virginia. 
— 


FINAL EDITORIAL REPORT 


The 1956 Annals contained ninety papers and notes, and a total (counting 
miscellaneous material) of 1212 pages. The increase in size allowed the backlog 
to be kept at somewhat less than one issue. The number of papers submitted in 
the year ending October 31, 1956 was about 7% more than in the preceding 
year, but the number of manuscript pages submitted showed a slight decrease. 
(See Interim report in the December, 1956 Annals.) 

Many thanks are due to the regular Cooperating Members of the Annals for 
their hard work, and grateful acknowledgment is made to the following people 
for generous assistance: A. C. Aitken, T. W. Anderson, R. R. Bahadur, N. T. J. 
Bailey, R. Bellman, P. Billingsley, A. W. Boldyreff, G. E. P. Box, A. B. Clarke, 
W. G. Cochran, H. E. Daniels, F. N. David, M. Donsker, M. Dresher, M. Dwass, 
H. P. Edmundson, D. A. Edwards, Joanne Elliot, B. Epstein, P. Erdés, T. Fer- 
guson, D. J. Finney, G. Forsythe, I. J. Good, L. Goodman, F. Graybill, B. G. 
Greenberg, J. Gurland, L. A. Goodman, P. R. Halmos, W. Hoeffding, P. G. 
Hoel, H. Hotelling, N. L. Johnson, M. Juncosa, O. Kempthorne, D. G. Kendall, 
M. G. Kendall, H. Levene, M. Loéve, R. D. Lord, E. Lukacs, C. B. McGuire, 
B. McMillan, W. G. Madow, A. W. Marshall, F. J. Massey, P. Meier, M. R. 
Mickey, A. M. Mood, P. A. P. Moran, K. W. Morton, L. Moses, M. Peisakoff, 
J. W. Pratt, R. Pyke, D. Ray, E. Reich, M. Rosenblatt, H. L. Royden, H. Ruben, 
M. R. Sampford, I. R. Savage, E. L. Scott, L. Shapley, B. Sherman, J. L. Snell, 
D. Stoller, A. H. Stone, R. F. Tate, H. Teicher, D. Teichroew, A. J. Thomasian, 
J. W. Tukey, D. Wallace, L. Wegner, B. L. Welch, O. Wesler, and C. B. Winsten. 

Special thanks are also due to Ann Greene, Patricia Stebbins, and Dorothy 
Stewart for editorial assistance 

T. E. Harris 
Editor 
January 31, 1957 


eu 


PUBLICATIONS RECEIVED 


Steve, Spreio, Aspetti Attuali dell’ Imposizione Indiretta, and Feuice Vinc1, L’ Aitivito 
dell’ Istituto nello Scorso Anno Accademico, Vol. 22, Universita degli studi di Milano, 
Istituo di Scienze Economiche e Statistiche, Milan, 1956, 21 pp. 

Vincr, Fex.ice, La Meccanica Economica Nel Pensiero di Vilfredo Pareto Teoria e Pratica dei 
Costi Comuni, Vol. 23, Universita degli studi di Milano, Istituto di Scienze Economiche 
e Statistiche, Milan, 1956, 27 pp. 

















TRABAJOS DE ESTADISTICA 


Review published by ‘‘Instituto de Investigaciones Estadisticas’”’ of the ‘‘Consejo 
Superior de Investagaciones Cientfficas.’”” Madrid, Spain. 


Vol. VII CONTENTS Cuaderno III 


B. M. Bennetr.......On the variance stabilizing properties of certain logarithmic 

transformations. 

ANGEL Vegas.....Inferencia estadistica en los modelos biométricos y su aplicacién 

al seguro de vida. 

Srxto Rios.... .Sobre la nocién de estimador consistente. 

NOT AS. 

Algunas notas sobre la Estadistica en Holanda. . : AZORIN 
Programacion lineal, un instrumento nsientiiie al servicio de los Empresarios 

E. Cansapo. 

CRONICAS. BIBLIOGRAFIA. CUESTIONES. 


For everything in connection wh works, exchanges and subscription write to Professor Sixto Riss, Instituto 
de Investigaciones Estadisticas of the ne Consejo Superior de Investigaciones Cientificas (Serrano, 1 2). Madrid 
Spain. The Review is composed of three fascicles published three times a year (about 350 pages), and its annual 
price is 100 pesetas for Spain and South America oo) $4.00 U.S.A. for all other countries. 





THE 1955-56 PROCEEDINGS 


of the Business and Economic Statistics Section 
of the American Statistical Association 


include a wealth of information, presented by outstanding authorities, designed to help deal with problems 
confronted by business economists and business statisticians in their day-to-day activities. 


How Stable the Economy? 
Criteria for Private Debt * The Role of SUB * Costs and Prices * Productivity Cycles * Stability 
vs. Instability \ 
Long-Term Growth 
Financial Implications * Measuring Growth Potentials * Projections Critically Appraised 
Progress in Business Statistics 
Impact of Electronic Computing * Flow of Funds * Labor Turnover * Consumer Price Indexes * 
An Integrated Federal Program 
Forecasts and Forecasting 
ue Duteas Outlook * Demands for Durable Goods * Forecasting in Industry * The Stock 
arket 


The material contained in the Proceedings reflects the current om of thinking on questions which are 
not only of continuing interest to the academician but are crucial also in shaping decisions in business and 
Government. The price is $3.75 ($3.25 for members of the American Statistical iation). Copies of the 1954 
Proceedings may be obtained for an additional charge of only $1.00 if ordered in conjunction with these Pro- 
ceedings. Pleae address orders to the: 


AMERICAN STATISTICAL ASSOCIATION 
RFEACON BUILDING 

1757 K STREET, N. W. 

WASHINGTON 6, D. C. 





BIOMETRIKA 


Volume 43 Contents Parts 3 and 4, December 1956 


ROYSTON, Erica. Studies in the history of probability and statistics. III]. A note on the history of the 
graphical presentation of data. WILLIAMS, C. B. Studies in the history of probability and statistics. 1V 
A note on an early statistical study of literary style. WALKER, A.M. A ness of fit test for spectral 
distribution functions of stationary time series with normal residuals. GANI, J. Sufficiency conditions 
in regular Markov chains and certain random walks. DERMAN,C. Some asymptotic distribution theory 
for Markov chains with a denumerable number of states. LAWLEY, D. N. A general method for approxi- 
mating to the distribution of likelihood ratio criteria. JAMES, G.S. On the accuracy of weighted means 
and ratios. BAILEY, N. T. J. On eens Ss latent and infectious periods of measles. II. Families 
with three or more susceptibles. BAILEY, N. T. J. Significance tests for a variable chance of infection in 
chain-binomial theory. WHITTLE, P. On the variation of yield variance with plot size. WATSON, 
G. 8. and WILLIAMS, E. J. On the construction of significance tests on the circle and the sphere. QUE- 
NOUILLE, M. H. Notes on bias in estimation. ROY, 8S. N. & MITRA,S. K. An introduction to some 
non-parametric generalizations of analysis of variance and multivariate analysis. KAMAT,A.R. A two- 
sample distribution-free test. BARTON, D. E. Addendum. The limiting distribution of Kamat’s test 
statistics. RAY, W.D. Sequential analysis applied to certain experimental designs in the analysis of vari- 
ance. BROADBENT,S. R. Lognormal approximation to products and quotients. BLISS, C. I., COCH- 
RAN, W. G. and TUKEY, J. W. A rejection criterion based upon the range. CROW, E. L. Confidence 
intervals for a proportion. 
Miscellanea 

Contributions by Anscomse, F. J., Buanx, A. A., Cox, D. R., Davin, F. N., Davin, H. A., Dovaras 
J. B., Gusert, N. E. G., Hortiy, N. A., Moore, P. G., Rerers¢r, O., Rusen, H., Stuart, A., Tarior, 
J., Watson, G. 8. 
Corrigenda 

Gant, J.; and Tracts for Computers X X VI 
Reviews Other Booke Received 


The subscription price, payable in advance, is 45s. inland, 548. export (per volume including postage). Cheques 
should be drawn to Biometrika and sent to “The Secretary, Biometrika Office. Department of Statistics, 
University College. London, W.C. 1.” All foreign cheques must be in sterling and drawn on a bank 
having a London agency. 





MATHEMATICAL REVIEWS 


A journal containing reviews of the mathematical liler- 
alure of the world, with full subject and author indices 


Publication of this journal is sponsored by the American Mathe- 
matical Society, Mathematical Association of America, Institute of 
Mathematical Statistics, London Mathematical Society, Edinburgh 
Mathematical Society, Union Matematica Argentina, and others 


Subscriptions accepted to cover the calendar year only. 
Issues appear monthly except July. $20.00 per year. 
Send subscription order or request for sample copy to 


AMERICAN MATHEMATICAL SOCIETY 
80 Waterman Street, Providence 6, Rhode Island 





ECONOMETRICA 


Journal of the Econometric Society 
Contents of Vol. = 4 No. 1- soma 1957 


ARTHUR SMITHIES ...Economic Fluctuations and Growth 
Tsa.iine C. Koopmans anp Martin BeckMaNN 


Assignment Problems and the Location of Economic Activities 
R. L. Basmann A Generalized Classical Method of Linear Estimation of Coefficients in a —— 
uation 
Harry M. Markowitz anno Avan 8S. Manne ..- On the Solution of Discrete Programming Problems 
H. Tuem Linear Aggregation = ee ut Analysis 

G. Sruver A New Index Num Formula 
A. CHarnes anp W. W. Cooper Nonlinear Power of Adjacent Extreme Point Methods in Linear 
rogramming 
W. J. Berncer ann Epwarp Saise. Power Series Inversion of the Leontief Matrix 
H. Uzawa Note on the Rational Selection of Decision Functions 
8. Fuzixno A Theory of Economic Fluctuations in a Capitalist Economy—Economics of Cycles and 
Growth, by Michio Morishima (A Review Article) 

Book Reviews 


Flow of Funds in the United States 1939-1953 (Board of Governors of the Federal Reserve System). Review 
by Gerhard Colm 

Rank Correlation Methods (M. G. Kendall). Review by Wassily Hoeffding 

Politicheska’a ekonomi’a: Uchebnik (Political Economy: Textbook) To of Sciences of the USSR, Eco- 
nomic Institute). Review by Eberhard Fels 

A(ndrei) A(ndreevich) Markov, Isbrannye trudy: teoria chisel, teoria vero at ters (Markov's Selected Works on 
the Theory of Numbers and the Theory of Probab ) (Yu. Vv. lank. ed.). Review by Eberhard Fels 

Kurs teorii vero'atnostei (A Course in the T ‘robability) (B. V . Gnedenko). Review by Eberhard Fels 

Income of the American People (Herman P. Mil - Review by Richard Ruggles 

La “Cowles Commission” ed. i modelli econometrici (Adalberto Predetti). Review by R. S. Eckaus 

The Theory of Economic Growth (W. Arthur Lewis). Review by K. K. Kurihara 

Interindustry Economic Studies: A Comrehenaie Bibliography on Interindustry Research (Vera Riley and 
Robert Loring Allen). Review by R 

Analysis of Confounded Factorial Designs in Single Replications (F. E. Binet, R. T. Leslis, S. Weiner, and R. 
L. Anderson). Review by O. Kempthorne 


Les Cadres de l’Industrie et du Commerce en France (Francois Jaquin). Review by Walter Froehlich 
The Failures of Economics: A Diagnostic Study (Sidney Schoeffler). Review by Gregor Sebba 


JOURNAL OF THE 
ROYAL STATISTICAL SOCIETY 


Series B ( Methodological) 
Vol. XVIII, No. 1, 1956 


Some Tests of Significance with Ordered Variables.....F. N. Davip anv N. L. Jounson. (With Discussion). 
Economic Choice of the Amount of Experimentation......P. M. Grunpr, M. J. R. Heary, anv D. H. Regs. 
(With Discussion). 
On a Test of Significance in Pearson’s Biometrika Table (No, 11)................0.00005 . R.A. Frisser. 
A Test of Significance for an Unidentifiable Relation... ... : .....P. A. P. Moran. 
Confidence Limits for the Gradient in the Linear Functional Relationship............Montca A. Creasy. 
Algebraic Theory of the Computing Routine for Tests of Significance on the Dimensionality of Normal 
Multivariate Systems . F. E. Bryer anv G. 8. Watson. 
Some Notes on Ordered Random Intervals D. E. Barton anv F. N. Davi. 
A Sequential Test for Randomness... D. J. Bartao.tomew. 
Partitions in More Than One Dimension 

On the Estimation of Small Frequencies in Contingency Tables.........................002000 I. J. Goon. 
An Elementary Method of Solution of the Queueing Problem with a Single Server and Constant Parameters 
. G. CHAMPERNOWNE. 

Random Queueing Processes with Phase-type Service 
The Within-Animal Bioassay with Quantal Responses .... P. J. Cuarmnesonp. 


The Royal Statistical Society, 21, Bentinck Street, London, W. 1 





SANKHYA 


The Indian Journal of Statistics 
Edited by P. C. Mahalanobis 


Vol. 17, Part 2, 1956 


On The Recovery of Inter Block Information Varietal Trials C. RapHAkrisHna Rao 
Classification And Analysis of Linked Block Designs J. Roy anv R. G. Lana 
On the Dual of a PBIB Design and a New Class of Designs with Two Replications 


C. 8. RAMAKRISHNAN 
Fractional Replication in Asymmetrical Factorial Designs and Partially Balanced Arrays 


I. M. CHaKRAVARTI 
A General Class of Quasifactorial and Related Designs C. Rapuakrisana Rao 


Two Associate Partially Balanced Designs Involving Three Replications J. Roy anv R. G. Lana 
Some Series of Balanced Incomplete Block Designs....... D. A. Sprott 
The Concept of Asymptotic Efficiency. . D. Basu 
A Note on the Determination of Optimum Probabilities in ‘Sampling without Replacement Des Ras 


ANNUAL SuBscripTion: 30 rupees ($10.00), 10 rupees ($3.50) per issue. 
Bacx Numpers: 45 rupees ($15.00) per volume; 12/8 rupees ($4.50) per issue. 
Subscriptions and orders for back numbers should be sent to 


STATISTICAL PUBLISHING SOCIETY 
204/1 Barrackpore Trunk Road Calcutta 35, India 





f 





as 


nd 


re 


es 


